MaybeShewill-CV / CRNN_Tensorflow

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition
MIT License
1.03k stars 388 forks source link

Prediction on training data gives longer and different labels? #350

Closed kilanny closed 5 years ago

kilanny commented 5 years ago

I am training on dataset of numbers only 0-9 labels. Label is always 14 digits. After training for 20,000 epochs I get 36 digit label. e.g. gt 54105872404867 got 8050364139709205551580, although already this image was in train data.
Trained using tf version 1.14.0:

I1001 08:06:13.945861 131 train_shadownet.py:324] Epoch_Train: 19980 cost=  0.000088
I1001 08:06:14.060312 131 train_shadownet.py:324] Epoch_Train: 19981 cost=  0.000088
I1001 08:06:14.182707 131 train_shadownet.py:324] Epoch_Train: 19982 cost=  0.000097
I1001 08:06:14.313489 131 train_shadownet.py:324] Epoch_Train: 19983 cost=  0.000097
I1001 08:06:14.434372 131 train_shadownet.py:324] Epoch_Train: 19984 cost=  0.000216
I1001 08:06:14.559236 131 train_shadownet.py:324] Epoch_Train: 19985 cost=  0.000111
I1001 08:06:14.679686 131 train_shadownet.py:324] Epoch_Train: 19986 cost=  0.000068
I1001 08:06:14.803079 131 train_shadownet.py:324] Epoch_Train: 19987 cost=  0.000086
I1001 08:06:14.927337 131 train_shadownet.py:324] Epoch_Train: 19988 cost=  0.000125
I1001 08:06:15.049385 131 train_shadownet.py:324] Epoch_Train: 19989 cost=  0.000143
I1001 08:06:15.175829 131 train_shadownet.py:324] Epoch_Train: 19990 cost=  0.000084
I1001 08:06:15.304835 131 train_shadownet.py:324] Epoch_Train: 19991 cost=  0.000091
I1001 08:06:15.426403 131 train_shadownet.py:324] Epoch_Train: 19992 cost=  0.000210
I1001 08:06:15.550206 131 train_shadownet.py:324] Epoch_Train: 19993 cost=  0.000080
I1001 08:06:15.694252 131 train_shadownet.py:324] Epoch_Train: 19994 cost=  0.000102
I1001 08:06:15.819186 131 train_shadownet.py:324] Epoch_Train: 19995 cost=  0.000078
I1001 08:06:15.938702 131 train_shadownet.py:324] Epoch_Train: 19996 cost=  0.000087
I1001 08:06:16.079816 131 train_shadownet.py:324] Epoch_Train: 19997 cost=  0.000101
I1001 08:06:16.220785 131 train_shadownet.py:324] Epoch_Train: 19998 cost=  0.000139
I1001 08:06:16.352691 131 train_shadownet.py:324] Epoch_Train: 19999 cost=  0.000104
I1001 08:06:16.472107 131 train_shadownet.py:324] Epoch_Train: 20000 cost=  0.000180
I1001 08:06:16.592458 131 train_shadownet.py:324] Epoch_Train: 20001 cost=  0.000106

I have built a dataset using a variable width-height images in a csv like: image_name,label Then generated the lexicon.txt and train, test, and validate files using

import pandas as pd
!sed -i '1s/^/idx,label\n/' no/labels.csv
ds = pd.read_csv("no/labels.csv")

import numpy as np
labels = np.sort(np.unique(ds[['label']].values))
with open('no/lexicon.txt', 'w', encoding='utf-8') as f:
  f.write('\n'.join([str(i) for i in labels]))

with open('no/annotation_train.txt', 'w', encoding='utf-8') as train:
  with open('no/annotation_test.txt', 'w', encoding='utf-8') as test:
    with open('no/annotation_val.txt', 'w', encoding='utf-8') as val:
      for index, row in ds.iterrows():
        sample_path = "./pic/" + str(row[0]) + ".png"
        label = str(np.where(labels == row[1])[0][0])
        if index > 9000: # 1000 imags for test
          test.write(sample_path + ' ' + label + "\n")
        elif index > 8000: # 1000 images for validation
          val.write(sample_path + ' ' + label + "\n")
        else: # 8000 images for training
          train.write(sample_path + ' ' + label + "\n")

Then generated tfrecords:

!mkdir no/tfrecords
%run tools/write_tfrecords {"--dataset_dir"} {"no/"} {"--save_dir"} {"no/tfrecords"}

Finally decreased epoch count and started training:

!sed -i -- 's,__C.TRAIN.EPOCHS = 2000000,__C.TRAIN.EPOCHS = 20000,g' config/global_config.py
%run tools/train_shadownet.py {"--dataset_dir"} {"no/"} {"--char_dict_path"} {"data/char_dict/char_dict.json"} {"--ord_map_dict_path"} {"data/char_dict/ord_map.json"}
MaybeShewill-CV commented 5 years ago

@ibraheemalkilanny You may reduce the sequence length and test again:)

kilanny commented 5 years ago

@MaybeShewill-CV Reducing SEQ_LENGTH to 14 or 15 or 16 giving me enf cost

Sample from no.csv

idx,label
1,24438184568785
2,54105872404867
3,84064371245548
4,74583093168537
5,98424321583283
6,28376215059437
7,53971777788694
8,33491558037994
9,12195351881743
10,78040480271009
11,12466541365433
12,85041576740458
13,43116916958650
14,46899200313839
15,23540081557455
16,74543767222738
17,92751256256102
18,63114905756239
19,10102667018211
20,09362298900141
21,47182254377035
22,93132088505022
23,95059362370598
24,06675732053885
25,38298112821601
MaybeShewill-CV commented 5 years ago

@ibraheemalkilanny 1. Make sure you have correctly organized your training data which is supposed to share the same format as the Synth90k dataset. 2. Reduce the sequence when testing:)