Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
575 stars 147 forks source link

Question on train_text_recognition.py #67

Open harshalcse opened 5 years ago

harshalcse commented 5 years ago

Is train_text_recognition.py is used only for bulding completely new model from cratch or building extended model from existing svhn dataset trained model or fsns dataset pretrained model ?

Bartzi commented 5 years ago

it is used to build new models from scratch. No finetuning from svhn or fsns models is necessary

harshalcse commented 5 years ago

what need to take care while preprocessing of dataset.

Bartzi commented 5 years ago

Hmm, not much I suppose... Just make sure you have already cropped text lines... you don;t really need to create a multistep curriculum for text recognition.

harshalcse commented 5 years ago

Right now training of SVHN dataset is done but when I used char_map.json because I want to train my custom mode only for alphanumeric characters only

python3 chainer/train_text_recognition.py /root/small_dataset_2/curriculum.json log --blank-label 0 --batch-size 16 --is-trainer-snapshot --use-dropout --char-map /root/small_dataset_2/ctc_char_map.json --gpu 0 --snapshot-interval 1000 --dropout-ratio 0.2 --epoch 200 -lr 0.0001

{
    "0": 9250,
    "1": 48,
    "2": 49,
    "3": 50,
    "4": 51,
    "5": 52,
    "6": 53,
    "7": 54,
    "8": 55,
    "9": 56,
    "10": 57,
    "11": 45,
    "12": 65,
    "13": 66,
    "14": 67,
    "15": 68,
    "16": 69,
    "17": 70,
    "18": 71,
    "19": 72,
    "20": 74,
    "21": 75,
    "22": 76,
    "23": 77,
    "24": 78,
    "25": 80,
    "26": 82,
    "27": 83,
    "28": 84,
    "29": 85,
    "30": 86,
    "31": 87,
    "32": 88,
    "33": 89,
    "34": 90
}

my gt_word.csv file look like this

17      1
/root/small_dataset_2/9999/0.JPG        MRHDG1840KP033812
/root/small_dataset_2/9999/1.JPG        MRHRW2840KP060067
/root/small_dataset_2/9999/2.JPG        MRHDG1847KP033824
/root/small_dataset_2/9999/3.JPG        MRHRW2850KP062158
/root/small_dataset_2/9999/5.JPG        MRHDG1840KP032255
/root/small_dataset_2/9999/6.JPG        MRHRW6830KP102532
/root/small_dataset_2/9999/7.JPG        MRHRU5870KP101363
/root/small_dataset_2/9999/9.JPG        MRHRU5850KP100742
/root/small_dataset_2/9999/10.JPG       MRHRW1850KP081060
/root/small_dataset_2/9999/11.JPG       MRHDG1845KP032378

but got following error

  format(optimizer.eps))
Exception in main training loop: '35'
Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 315, in run
    update()
  File "/usr/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in update_core
    loss = _calc_loss(self._master, batch)
  File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 269, in _calc_loss
    return model(*in_arrays)
  File "/root/see-master/chainer/utils/multi_accuracy_classifier.py", line 48, in __call__
    reported_accuracies = self.accfun(self.y, t)
  File "/root/see-master/chainer/metrics/textrec_metrics.py", line 47, in calc_accuracy
    word = "".join(map(self.label_to_char, word))
  File "/root/see-master/chainer/metrics/loss_metrics.py", line 181, in label_to_char
    return chr(self.char_map[str(label)])
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "chainer/train_text_recognition.py", line 299, in <module>
    trainer.run()
  File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 329, in run
    six.reraise(*sys.exc_info())
  File "/usr/lib/python3.5/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.5/site-packages/chainer/training/trainer.py", line 315, in run
    update()
  File "/usr/lib/python3.5/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in update_core
    loss = _calc_loss(self._master, batch)
  File "/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 269, in _calc_loss
    return model(*in_arrays)
  File "/root/see-master/chainer/utils/multi_accuracy_classifier.py", line 48, in __call__
    reported_accuracies = self.accfun(self.y, t)
  File "/root/see-master/chainer/metrics/textrec_metrics.py", line 47, in calc_accuracy
    word = "".join(map(self.label_to_char, word))
  File "/root/see-master/chainer/metrics/loss_metrics.py", line 181, in label_to_char
    return chr(self.char_map[str(label)])
KeyError: '35'

Please help me out in that issue .

Bartzi commented 5 years ago

Remember: The char_map is only used as a mapping from a predicted class to a character. In order to make the code work with another char_map that has less classes, you'll need to also adjust the output of the classification layer of the recognition network.

harshalcse commented 5 years ago

I used different character map file to predict classes but how to adjust output of classification layer of the recognition network. please guide.

harshalcse commented 5 years ago

Please find following command.

 python3 chainer/train_text_recognition.py /root/small_dataset_2/curriculum.json log --blank-label 0 --batch-size 16 --is-trainer-snapshot --use-dropout --char-map /root/small_dataset_2/ctc_char_map.json --gpu 0 --snapshot-interval 1000 --dropout-ratio 0.2 --epoch 200 -lr 0.0001

/usr/lib/python3.5/site-packages/chainer/backends/cuda.py:98: UserWarning: cuDNN is not enabled.
Please reinstall CuPy after you install cudnn
(see https://docs-cupy.chainer.org/en/stable/install.html#install-cudnn).
  'cuDNN is not enabled.\n'
/usr/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/usr/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:151: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
  format(optimizer.eps))
epoch       iteration   main/loss   main/accuracy  lr          fast_validation/main/loss  fast_validation/main/accuracy  validation/main/loss  validation/main/accuracy
Exception in thread prefetch_loop:............................]  0.15%
multiprocessing.pool.RemoteTraceback: ........................] 30.62%
"""      4 iter, 0 epoch / 200 epochs
Traceback (most recent call last):me to finish: 1:28:18.436435.
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 552, in _fetch_run
    data = _fetch_dataset[index]
  File "/usr/lib/python3.5/site-packages/chainer/dataset/dataset_mixin.py", line 67, in __getitem__
    return self.get_example(index)
  File "/root/see-master/chainer/datasets/file_dataset.py", line 144, in get_example
    labels = self.get_labels(self.labels[i])
  File "/root/see-master/chainer/datasets/file_dataset.py", line 163, in get_labels
    labels = [int(self.reverse_char_map[ord(character)]) for character in word]
  File "/root/see-master/chainer/datasets/file_dataset.py", line 163, in <listcomp>
    labels = [int(self.reverse_char_map[ord(character)]) for character in word]
KeyError: 79
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 453, in _run
    alive = self._task()
  File "/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py", line 475, in _task
    data_all = future.get(_response_time)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
KeyError: 79

/usr/lib/python3.5/site-packages/chainer/iterators/multiprocess_iterator.py:31: TimeoutWarning: Stalled dataset is detected. See the documentation of MultiprocessIterator for common causes and workarounds:.........................] 45.93%
https://docs.chainer.org/en/stable/reference/generated/chainer.iterators.MultiprocessIterator.html
  MultiprocessIterator.TimeoutWarning)o finish: 1:26:07.495822.

please help .

Bartzi commented 5 years ago

Hmm, okay I had a look at the code again. Turns out, the number of classes is already automatically adjusted based on the number of entries in the char_map. So your last problem is caused by having characters in your groundtruth that are not in the char_map. For the error before I don't really know why this happens, it shouldn't.

harshalcse commented 5 years ago

when that error is comming training curriculum has finished. terminating the training process.

Bartzi commented 5 years ago

When the error KeyError: '35' comes, you also get training curriculum has finished. terminating the training process.?

harshalcse commented 5 years ago

While training of datset is started that time following error comes:

5000 2.07316 0 9.96634e-05 2.04356 0 2.05553 0 $ total [###...............................................] 7.04% this epoch [###...............................................] 7.46% 5000 iter, 14 epoch / 200 epochs 0.12627 iters/sec. Estimated time to finish: 6 days, 1:17:57.009388. enlarging datasets Training curriculum has finished. Terminating the training process.

then halted training of datset . please help.

Bartzi commented 5 years ago

It seems that the system thinks your training converged enough, so that it can use the second level in the curriculum, which does not exist {look here](https://github.com/Bartzi/see/blob/master/chainer/utils/baby_step_curriculum.py#L82)... you could set the parameter min_delta of the curriculum to a very small value (like 1e-8)

harshalcse commented 5 years ago

I am using min_delta = 1.0 I train model using following command, python3 chainer/train_text_recognition.py /root/small_dataset_4/curriculum.json log --blank-label 0 --batch-size 16 --is-trainer-snapshot --use-dropout --char-map /root/small_dataset_4/ctc_char_map.json --gpu 0 --snapshot-interval 20000 --dropout-ratio 0.2 --epoch 200 -lr 0.0001

My curriculam json as follows :

[
        {
                "train": "/root/small_dataset_4/gt_word.csv",
                "validation": "/root/small_dataset_4/gt_word.csv"
        }
]

Epochs, iterations and batches are as follows

epoch       iteration   main/loss   main/accuracy  lr          fast_validation/main/loss  fast_validation/main/accuracy  validation/main/loss  validation/main/accuracy
0           100         2.77838     0              3.08566e-05                                                                                                          
0           200         2.3851      0              4.25853e-05                                                                                                          
0           300         2.33045     0              5.09208e-05                                                                                                          
0           400         2.28616     0              5.74294e-05                                                                                                          
1           500         2.25634     0              6.27392e-05                                                            2.26484               0                       
1           600         2.26055     0              6.71828e-05                                                                                                          
1           700         2.2739      0              7.0964e-05                                                                                                           
1           800         2.23831     0              7.42193e-05                                                                                                          
2           900         2.27829     0              7.70463e-05                                                            2.24612               0                       
2           1000        2.23279     0              7.95176e-05  2.24668                    0                                                                            
2           1100        2.2495      0              8.16892e-05                                                                                                          
2           1200        2.24512     0              8.36054e-05                                                                                                          
3           1300        2.34768     0              8.53021e-05                                                            2.36744               0                       
3           1400        2.23765     0              8.68087e-05                                                                                                          
3           1500        2.22382     0              8.81497e-05                                                                                                          
3           1600        2.21226     0              8.93457e-05                                                                                                          
4           1700        2.23709     0              9.04141e-05                                                            2.50032               0                       
4           1800        2.20482     0              9.13701e-05                                                                                                          
4           1900        2.23361     0              9.22265e-05                                                                                                          
4           2000        2.18711     0              9.29946e-05  2.17601                    0                                                                            
5           2100        2.1789      0              9.36842e-05                                                            2.17091               0                       
5           2200        2.16862     0              9.43037e-05                                                                                                          
5           2300        2.16191     0              9.48608e-05                                                                                                          
5           2400        2.16445     0              9.5362e-05                                                                                                           
6           2500        2.15081     0              9.58132e-05                                                            2.14614               0                       
6           2600        2.14921     0              9.62197e-05                                                                                                          
6           2700        2.1292      0              9.6586e-05                                                                                                           
6           2800        2.12386     0              9.69162e-05                                                                                                          
7           2900        2.12777     0              9.7214e-05                                                            2.14358               0                        
7           3000        2.12236     0              9.74827e-05  2.09999                    0                                                                            
7           3100        2.11735     0              9.77252e-05                                                                                                          
7           3200        2.13403     0              9.7944e-05                                                                                                           
8           3300        2.10966     0              9.81416e-05                                                            2.09331               0                       
8           3400        2.10381     0              9.83201e-05                                                                                                          
8           3500        2.14036     0              9.84812e-05                                                                                                          
8           3600        2.11325     0              9.86268e-05                                                                                                          
8           3700        2.10877     0              9.87584e-05                                                                                                          
9           3800        2.1011      0              9.88773e-05                                                            2.11267               0                       
9           3900        2.09817     0              9.89847e-05                                                                                                          
9           4000        2.10695     0              9.90818e-05  2.07527                    0                                                                            
9           4100        2.09598     0              9.91696e-05                                                                                                          
10          4200        2.09201     0              9.9249e-05                                                            2.09834               0                        
10          4300        2.08747     0              9.93207e-05                                                                                                          
10          4400        2.09943     0              9.93856e-05                                                                                                          
10          4500        2.11838     0              9.94443e-05                                                                                                          
11          4600        2.09862     0              9.94973e-05                                                            2.10762               0                       
11          4700        2.11332     0              9.95453e-05                                                                                                          
11          4800        2.10901     0              9.95887e-05                                                                                                          
11          4900        2.10108     0              9.96279e-05                                                                                                          
12          5000        2.1099      0              9.96634e-05  2.09164                    0                              2.0996                0                       
enlarging datasets............................................]  6.08%
Training curriculum has finished. Terminating the training process.62%
      5000 iter, 12 epoch / 200 epochs

Please Help to solve this .

harshalcse commented 5 years ago

till gettting same error as as I tried to run following command

python3 chainer/train_text_recognition.py /root/small_dataset_4/curriculum.json log --blank-label 0 --batch-size 16 --is-trainer-snapshot --use-dropout --char-map /root/small_dataset_4/ctc_char_map.json --gpu 0 --snapshot-interval 20000 --dropout-ratio 0.2 --epoch 200 -lr 0.0001

     total [##############################....................] 60.78%
this epoch [#######...........................................] 15.62%
      5000 iter, 12 epoch / 20 epochs
     0.114 iters/sec. Estimated time to finish: 7:51:40.866016.
enlarging datasets
Training curriculum has finished. Terminating the training process.
Bartzi commented 5 years ago

please try to set min_delta to 1e-8 and try again

harshalcse commented 5 years ago

@Bartzi I set min_delta = 1e-8 in [https://github.com/Bartzi/see/blob/2014359a1489edbbb78f24ddce89383e0078545f/chainer/train_text_recognition.py#L82]( ) then also same issue came.

harshalcse commented 5 years ago

bboxes are look as follows unable to localize alphanumeric charachters properly so how to achieve more accuracy.

python3 chainer/train_text_recognition.py /root/small_dataset_4/curriculum.json log --blank-label 0 --batch-size 16 --is-trainer-snapshot --use-dropout --char-map /root/small_dataset_4/ctc_char_map.json --gpu 0 --snapshot-interval 20000 --dropout-ratio 0.2 --epoch 200 -lr 0.0001 40 50 130

Bartzi commented 5 years ago

How many epochs did you train? Did it run for 200 epochs? Can you see any loss improvement? How large is your dataset? You could try to increase the batch size, and leave out --is-trainer-snapshot (you only need this if you want to load a previously trained model), do not use --use-dropout

harshalcse commented 5 years ago

I ran it on 10 epochs and I see loss improvement but not in very high . My dataset contains images of 4603 images. yes I tried it with batch size of 64 .

Bartzi commented 5 years ago

It takes some time until things start to get better. Training a model using our approach does not work like training a model on ImageNet or something. The loss takes some time to decrease as first, one model needs to improve its predictions and the other has to get along with that.

Let it train until nothing happens anymore, once you did that you should throw away the model you got for the recognition part and restart the training with initializing the localization network, using the saved params and randomly initializing the recognition model. You can do this over and over again, until even that does not help anymore.

It might also be that the size of your train dataset is not large enough, but I'm not too sure about this.

harshalcse commented 5 years ago

so approximately how much size of dataset , size of epoch, size of batch required to achieve more accuracy?

Also is it okay to do duplication of dataset to increase dataset for achieving more accuracy ? Please help

harshalcse commented 5 years ago

At 100 epoch also same issue that Training curriculum has finished. Terminating the training process is coming

[[J55 5000 4.00388 0 9.96634e-09 3.96396 0 3.96532 0 $ total [##################################................] 69.23% this epoch [###################...............................] 38.25% 5000 iter, 55 epoch / 80 epochs 0.097178 iters/sec. Estimated time to finish: 6:21:10.341421. enlarging datasets Training curriculum has finished. Terminating the training process.

harshalcse commented 5 years ago

Hi, Still not achieve accuracy as training iterations as follows

python3 chainer/train_text_recognition.py /data/small_dataset_3/curriculum.json log --blank-label 0 -b 256 --is-trainer-snapshot --char-map /data/small_dataset_3/ctc_char_map.json -g 0 -si 1000 -dr 0.2 -e 200 -lr 1e-8 --zoom 0.9 --area-factor 0.0 --area-scale-factor 2 --load-localization
/usr/local/lib/python3.6/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/home/qgate/.local/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:151: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
  format(optimizer.eps))
epoch       iteration   main/loss   main/accuracy  lr          fast_validation/main/loss  fast_validation/main/accuracy  validation/main/loss  validation/main/accuracy
3           100         3.97704     0              3.08566e-09                                                            3.98018               0
7           200         3.97642     0              4.25853e-09                                                            3.97597               0
11          300         3.97557     0              5.09208e-09                                                            3.97559               0
15          400         3.97502     0              5.74294e-09                                                            3.97463               0
19          500         3.97436     0              6.27392e-09                                                            3.9742                0
23          600         3.97374     0              6.71828e-09                                                            3.97332               0
27          700         3.97294     0              7.0964e-09                                                            3.97258               0
31          800         3.97227     0              7.42193e-09                                                            3.97214               0
35          900         3.97166     0              7.70463e-09                                                            3.97143               0
38          1000        3.9709      0              7.95176e-09  3.97111                    0                              3.97076               0
42          1100        3.97021     0              8.16892e-09                                                            3.97022               0
46          1200        3.96958     0              8.36054e-09                                                            3.96959               0
50          1300        3.96892     0              8.53021e-09                                                            3.96887               0
54          1400        3.96808     0              8.68087e-09                                                            3.96827               0
58          1500        3.96746     0              8.81497e-09                                                            3.96742               0
62          1600        3.9669      0              8.93457e-09                                                            3.96659               0
66          1700        3.96627     0              9.04141e-09                                                            3.96625               0
70          1800        3.96535     0              9.13701e-09                                                            3.96522               0
73          1900        3.96485     0              9.22265e-09                                                            3.9648                0
77          2000        3.96412     0              9.29946e-09  3.96353                    0                              3.96405               0
81          2100        3.96332     0              9.36842e-09                                                            3.96351               0
85          2200        3.96274     0              9.43037e-09                                                            3.96245               0
89          2300        3.96205     0              9.48608e-09                                                            3.96194               0
93          2400        3.9614      0              9.5362e-09                                                            3.96123               0
97          2500        3.96079     0              9.58132e-09                                                            3.96064               0
101         2600        3.96003     0              9.62197e-09                                                            3.95995               0
105         2700        3.95936     0              9.6586e-09                                                            3.95916               0
108         2800        3.95863     0              9.69162e-09                                                            3.95848               0
112         2900        3.95802     0              9.7214e-09                                                            3.95783               0
116         3000        3.95718     0              9.74827e-09  3.9568                     0                              3.95723               0
120         3100        3.95666     0              9.77252e-09                                                            3.95674               0
124         3200        3.956       0              9.7944e-09                                                            3.95567               0
128         3300        3.95525     0              9.81416e-09                                                            3.95523               0
132         3400        3.95461     0              9.83201e-09                                                            3.95433               0
136         3500        3.95395     0              9.84812e-09                                                            3.95381               0
140         3600        3.95334     0              9.86268e-09                                                            3.95297               0
143         3700        3.95254     0              9.87584e-09                                                            3.95269               0
147         3800        3.95202     0              9.88773e-09                                                            3.95186               0
151         3900        3.95123     0              9.89847e-09                                                            3.95105               0
155         4000        3.95051     0              9.90818e-09  3.95014                    0                              3.95054               0
159         4100        3.95005     0              9.91696e-09                                                            3.95003               0
163         4200        3.94908     0              9.9249e-09                                                            3.94928               0
167         4300        3.94875     0              9.93207e-09                                                            3.94827               0
171         4400        3.94788     0              9.93856e-09                                                            3.94763               0
175         4500        3.94719     0              9.94443e-09                                                            3.94691               0
178         4600        3.9466      0              9.94973e-09                                                            3.94633               0
182         4700        3.94588     0              9.95453e-09                                                            3.94563               0
186         4800        3.94515     0              9.95887e-09                                                            3.94513               0
     total [##############################################....] 93.50%
this epoch [#################################################.] 99.16%
      4807 iter, 186 epoch / 200 epochs
Bartzi commented 5 years ago

Did you have a look at the predictions of the model on a sample image (images in bboxes folder in log dir)? What happens there over the course of the training? I also think that your learning rate is way to low. You should use values like 1e-4 and 1e-5.

harshalcse commented 5 years ago

@Bartzi I already tried with 1e-4 and 1e-5 but still not achieved

Bartzi commented 5 years ago

Did you look at the predictions (my first point of the last answer?) Those images are meant as a help to determine what the network does over time. This really helps to debug problems. You can also create an animation out of those image files with the create_video.py script in the utils folder.

But still 1e-4 and 1e-5 are the learning rates to use! (maybe also 1e-6)

AmarendarAndhe commented 4 years ago

@harshalcse, did you get any success on extracting text from images like black on black text?