Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
574 stars 147 forks source link

Build train.csv on custom dataset #27

Closed mit456 closed 6 years ago

mit456 commented 6 years ago

Hello @Bartzi ,

Sorry for opening a new issue here but I feel this will help people understanding training procedure on a custom dataset.

Was following fsns dataset to understand training code and procedure to build train.csv, I will share my understanding of building train.csv and then have a couple of question to ask.

As per instructions of fsns dataset preparation after running all scripts, train.csv (for a first training set of 512) consists 6 21 as the first line and then 1807 rows of the dataset description, each row has a column indicating path of the training filepath and then 126 columns representing fsns_char_map.json dict key where the value of the key is the actual character which is chr(key). Please correct me if I am wrong.

  1. Why 126? How to decide no of columns which represent each key? Since fsns_char_map.json has 134 key-value pairs then why not 134?
  2. Can it be more than no of keys of fsns_char_map.json or any char_map.json?
Bartzi commented 6 years ago

Hi,

your description is nearly correct.

fsns_char_map.json dict key where the value of the key is the actual character which is chr(key). Please correct me if I am wrong.

fsns_char_map.json is a file that contains a mapping from a predicted class (for the FSNS case we have 134 different classes) to the unicode representation of the character that is associated with this class. That means that value of the key is indeed the actual character but it is chr(value).

Regarading your questions:

  1. Its 126 because you always try to detect a maximum of 6 words per image, where each words has a maximum number of characters of 21 and 6 * 21 = 126. Each of these columns contains the class that is to be predicted. If there are less than 21 characters in any word the file is padded with the blank label. If there are less than 6 words in the image the not existing words are labelled as blank label.
  2. The number of columns has nothing to do with the amount of key-value pairs in the char-map this is only used for mapping a predicted class to a character.
mit456 commented 6 years ago

Hello @Bartzi ,

I am trying the following command for training on my custom dataset which I have built writing sentences on images using PIL library:

python train_text_recognition.py --blank-label 0 --char-map ../datasets/custom/custom_char_map.json --batch-size 30 --send-bboxes --gpus 0 ../datasets/custom/curriculum.json ../datasets/custom/logs

It throws an error train_text_recognition.py: error: argument -g/--gpus: invalid int value: '../datasets/custom/curriculum.json' It is not taking 0 as a argument value it's taking ../datasets/custom/curriculum.json as value why? Am I doing something wrong here?

Another doubt, there is no parser.add_argument statement for --gpus how is it getting appended to args and you are using len(args.gpu) in L-150.

After removing --gpus it throws a different error

Traceback (most recent call last):
  File "train_text_recognition.py", line 176, in <module>
    converter=get_concat_and_pad_examples(args.blank_label)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 121, in __init__
    assert len(iterators) == len(devices)
AssertionError
Bartzi commented 6 years ago

Hi,

the argumentparser is enriched by default arguments that are added here: https://github.com/Bartzi/see/blob/master/chainer/utils/train_utils.py#L175

You get your error, because we allow an arbitrary amount of gpu to specified for training. You can fix your problem by rearranging your command line arguments for instance like this: python train_text_recognition.py --blank-label 0 --char-map ../datasets/custom/custom_char_map.json --gpus 0 --batch-size 30 --send-bboxes ../datasets/custom/curriculum.json ../datasets/custom/logs

You should not get this error anymore, then =)

mit456 commented 6 years ago

Hello,

Getting out of memory error. Image size is (600 * 300)


/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:131: UserWarning: optimizer.eps is changed to 1e-08 by MultiprocessParallelUpdater for new batch size.
  format(optimizer.eps))
Exception in main training loop: out of memory to allocate 172800000 bytes (total 4223769600 bytes)
Traceback (most recent call last):
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 207, in update_core
    loss = _calc_loss(self._master, batch)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 236, in _calc_loss
    return model(*in_arrays)
  File "/home/void/Projects/see/chainer/utils/multi_accuracy_classifier.py", line 44, in __call__
    self.y = self.predictor(*x)
  File "/home/void/Projects/see/chainer/models/text_recognition.py", line 80, in __call__
    return self.recognition_net(images, h)
  File "/home/void/Projects/see/chainer/models/text_recognition.py", line 36, in __call__
    points = [F.spatial_transformer_grid(localization, self.target_shape) for localization in localizations]
  File "/home/void/Projects/see/chainer/models/text_recognition.py", line 36, in <listcomp>
    points = [F.spatial_transformer_grid(localization, self.target_shape) for localization in localizations]
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/array/spatial_transformer_grid.py", line 169, in spatial_transformer_grid
    return SpatialTransformerGrid(output_shape)(theta)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 235, in __call__
    ret = node.apply(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 135, in forward
    return self._function.forward(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 342, in forward
    return self.forward_gpu(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/array/spatial_transformer_grid.py", line 41, in forward_gpu
    grid_t = cuda.cupy.empty((B, H, W, 2), dtype=theta.dtype)
  File "/home/void/Projects/see/lib/python3.5/site-packages/cupy/creation/basic.py", line 19, in empty
    return cupy.ndarray(shape, dtype=dtype, order=order)
  File "cupy/core/core.pyx", line 93, in cupy.core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 415, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 831, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 852, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 614, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 663, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "cupy/cuda/memory.pyx", line 645, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 584, in cupy.cuda.memory.SingleDeviceMemoryPool._alloc
  File "cupy/cuda/memory.pyx", line 370, in cupy.cuda.memory._malloc
  File "cupy/cuda/memory.pyx", line 371, in cupy.cuda.memory._malloc
  File "cupy/cuda/memory.pyx", line 68, in cupy.cuda.memory.Memory.__init__
  File "cupy/cuda/runtime.pyx", line 214, in cupy.cuda.runtime.malloc
  File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cupy/cuda/memory.pyx", line 651, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 584, in cupy.cuda.memory.SingleDeviceMemoryPool._alloc
  File "cupy/cuda/memory.pyx", line 370, in cupy.cuda.memory._malloc
  File "cupy/cuda/memory.pyx", line 371, in cupy.cuda.memory._malloc
  File "cupy/cuda/memory.pyx", line 68, in cupy.cuda.memory.Memory.__init__
  File "cupy/cuda/runtime.pyx", line 214, in cupy.cuda.runtime.malloc
  File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cupy/cuda/memory.pyx", line 657, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 584, in cupy.cuda.memory.SingleDeviceMemoryPool._alloc
  File "cupy/cuda/memory.pyx", line 370, in cupy.cuda.memory._malloc
  File "cupy/cuda/memory.pyx", line 371, in cupy.cuda.memory._malloc
  File "cupy/cuda/memory.pyx", line 68, in cupy.cuda.memory.Memory.__init__
  File "cupy/cuda/runtime.pyx", line 214, in cupy.cuda.runtime.malloc
  File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorMemoryAllocation: out of memory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_text_recognition.py", line 293, in <module>
    trainer.run()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/trainer.py", line 313, in run
    six.reraise(*sys.exc_info())
  File "/home/void/Projects/see/lib/python3.5/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 207, in update_core
    loss = _calc_loss(self._master, batch)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 236, in _calc_loss
    return model(*in_arrays)
  File "/home/void/Projects/see/chainer/utils/multi_accuracy_classifier.py", line 44, in __call__
    self.y = self.predictor(*x)
  File "/home/void/Projects/see/chainer/models/text_recognition.py", line 80, in __call__
    return self.recognition_net(images, h)
  File "/home/void/Projects/see/chainer/models/text_recognition.py", line 36, in __call__
    points = [F.spatial_transformer_grid(localization, self.target_shape) for localization in localizations]
  File "/home/void/Projects/see/chainer/models/text_recognition.py", line 36, in <listcomp>
    points = [F.spatial_transformer_grid(localization, self.target_shape) for localization in localizations]
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/array/spatial_transformer_grid.py", line 169, in spatial_transformer_grid
    return SpatialTransformerGrid(output_shape)(theta)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 235, in __call__
    ret = node.apply(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 135, in forward
    return self._function.forward(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 342, in forward
    return self.forward_gpu(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/array/spatial_transformer_grid.py", line 41, in forward_gpu
    grid_t = cuda.cupy.empty((B, H, W, 2), dtype=theta.dtype)
  File "/home/void/Projects/see/lib/python3.5/site-packages/cupy/creation/basic.py", line 19, in empty
    return cupy.ndarray(shape, dtype=dtype, order=order)
  File "cupy/core/core.pyx", line 93, in cupy.core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 415, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 831, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 852, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 614, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 663, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 172800000 bytes (total 4223769600 bytes)
Bartzi commented 6 years ago

How about decreasing the batch size? Decreasing Input image size? Decreasing size of cropped image regions?

mit456 commented 6 years ago

Decreased batch size and input size to (300 * 150). I did not get the decreasing size of cropped image regions because I giving the complete image for training? Am I doing wrong? do I need to crop only the text part?

Training set images attached. 14 14 14

Bartzi commented 6 years ago

The network first takes as input the whole image. It localizes the text regions and crops those regions from the input image. (see Figure 2 in the paper). You can set the size those images are cropped to by changing the value of the variable target_size (here) to a lower value.

mit456 commented 6 years ago

Hello,

Tried training but it was always getting stuck at 99.67% on epoch 0, followed up issue 15 and got a similar error. Any hint you can give to debug this issue?

epoch       iteration   main/loss   main/accuracy  lr          fast_validation/main/loss  fast_validation/main/accuracy  validation/main/loss  validation/main/accuracy
Exception in main training loop: Each label `t` need to satisfy `0 <= t < x.shape[1] or t == -1`
Traceback (most recent call last):............................]  0.33%
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    update()ters/sec. Estimated time to finish: 0:00:00.
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 206, in update_core
    loss = _calc_loss(self._master, batch)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in _calc_loss
    return model(*in_arrays)
  File "/home/void/Projects/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
    self.loss = self.lossfun(self.y, t)
  File "/home/void/Projects/see/chainer/metrics/textrec_metrics.py", line 14, in calc_loss
    loss = self.calc_actual_loss(batch_predictions, None, t)
  File "/home/void/Projects/see/chainer/metrics/textrec_metrics.py", line 96, in calc_actual_loss
    return F.softmax_cross_entropy(predictions, labels)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 380, in softmax_cross_entropy
    normalize, cache_score, class_weight, ignore_label, reduce)(x, t)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 235, in __call__
    ret = node.apply(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 135, in forward
    return self._function.forward(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 342, in forward
    return self.forward_gpu(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 114, in forward_gpu
    _check_input_values(x, t, self.ignore_label)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 48, in _check_input_values
    raise ValueError(msg)
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "train_text_recognition.py", line 295, in <module>
    trainer.run()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/trainer.py", line 313, in run
    six.reraise(*sys.exc_info())
  File "/home/void/Projects/see/lib/python3.5/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 206, in update_core
    loss = _calc_loss(self._master, batch)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in _calc_loss
    return model(*in_arrays)
  File "/home/void/Projects/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
    self.loss = self.lossfun(self.y, t)
  File "/home/void/Projects/see/chainer/metrics/textrec_metrics.py", line 14, in calc_loss
    loss = self.calc_actual_loss(batch_predictions, None, t)
  File "/home/void/Projects/see/chainer/metrics/textrec_metrics.py", line 96, in calc_actual_loss
    return F.softmax_cross_entropy(predictions, labels)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 380, in softmax_cross_entropy
    normalize, cache_score, class_weight, ignore_label, reduce)(x, t)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 235, in __call__
    ret = node.apply(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function_node.py", line 245, in apply
    outputs = self.forward(in_data)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 135, in forward
    return self._function.forward(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/function.py", line 342, in forward
    return self.forward_gpu(inputs)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 114, in forward_gpu
    _check_input_values(x, t, self.ignore_label)
  File "/home/void/Projects/see/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 48, in _check_input_values
    raise ValueError(msg)
ValueError: Each label `t` need to satisfy `0 <= t < x.shape[1] or t == -1`
Bartzi commented 6 years ago

what did you do while creating your own dataset and char_map did you just create the char_map and did you also change the network structure? Once you create the char_map you have to adjust the number of output neurons of the last layer of the network. This might be the cause of your problem.

mit456 commented 6 years ago

Hello,

Thank you for a quick response, I added two extra classes in char_map of fsns and as you said to adjust no of output neuron so I need to change train_text_recognition.py L-112 i.e lable_size parameter to TextRecognitionNet from 52 to 136 because I have 0-135 entries in my custom_char_map.json

Changing output layer fixed the issue but still getting stuck at 96.67%, I have very small validation which consists only 30 images and 300 train images I will increase dataset if this sample set gets trained. One more question do I need to enable anything in training command for show_progress.py to see the progress? because running show_progress is not showing anything but tk window is opening

Bartzi commented 6 years ago

Okay, good that it worked.

I'm not sure why the network is still stuck at 96.67%. Must be something with the evaluator, maybe you can check that.

In order to enable the sending of bboxes you have to supply --send-bboxes to the train script and make sure to have the right ip address and port

mit456 commented 6 years ago

Hello,

I am very new to neural networks and their implementations so I might ask some silly questions so please bear me with that but I would love to debug the evaluator issue, so as make it work.

Question: By evaluator you mean FastEvaluator or epoch_evaluator? and what is the difference between them?

Bartzi commented 6 years ago

Oh yeah, right there are two evaluators...

It could be that the epoch_evaluator does no stop iterating on the data. But I'm not too sure about that.

mit456 commented 6 years ago

Hello @Bartzi,

Thanks for not marking this issue as closed and sorry for being idle for many days, I started debugging and found that if there is some problem with MultiprocessIterator because if I enable SerialIterator, training works it does not get stuck at epoch 0 and it completes.

I still have not got what actual issue is, will be able to figure out I feel but I have a couple of questions?

  1. Where is final model getting stored?

  2. Very silly question but how to confirm visually that all the training samples have gone through the training process, there is a way that if we multiply (no of iterations * batch_size = no of training samples) but after looking into logfile I found that after epoch1, number of iterations logged is 100 and my batch_size was 5 which is equal to 500, but there are only 300 training samples and 33 validation samples? and if we consider epoch [0,1] then iterations should be 150. what is the reason?

Have a look at the log file

[
    {
        "main/accuracy": 0.0,
        "main/loss": 0.8699310421943665,
        "iteration": 100,
        "epoch": 1,
        "validation/main/loss": 0.47458145022392273,
        "validation/main/accuracy": 0.0,
        "elapsed_time": 145.76641237600415,
        "lr": 0.0030856589376553494
    },
    {
        "main/accuracy": 0.018,
        "main/loss": 0.40315747261047363,
        "iteration": 200,
        "epoch": 3,
        "validation/main/loss": 0.3434253931045532,
        "validation/main/accuracy": 0.12857142857142856,
        "elapsed_time": 298.99436880199937,
        "lr": 0.004258534616241294
    },
    {
        "main/accuracy": 0.04000000000000001,
        "main/loss": 0.24916383624076843,
        "iteration": 300,
        "epoch": 5,
        "validation/main/loss": 0.28721046447753906,
        "validation/main/accuracy": 0.09523809523809525,
        "elapsed_time": 451.8581105320045,
        "lr": 0.00509208177314456
    },
    {
        "main/accuracy": 0.048000000000000015,
        "main/loss": 0.2306901216506958,
        "iteration": 400,
        "epoch": 6,
        "validation/main/loss": 0.2915477454662323,
        "validation/main/accuracy": 0.0,
        "elapsed_time": 605.382064570993,
        "lr": 0.0057429443144893875
    },
    {
        "main/accuracy": 0.07400000000000002,
        "main/loss": 0.23013180494308472,
        "iteration": 500,
        "epoch": 8,
        "validation/main/loss": 0.30023595690727234,
        "validation/main/accuracy": 0.042857142857142864,
        "elapsed_time": 760.5097847040015,
        "lr": 0.006273922657626689
    },
    {
        "main/accuracy": 0.04400000000000002,
        "main/loss": 0.22877845168113708,
        "iteration": 600,
        "epoch": 10,
        "validation/main/loss": 0.2746463119983673,
        "validation/main/accuracy": 0.05238095238095238,
        "elapsed_time": 909.0589009169926,
        "lr": 0.00671828171867259
    }
]
  1. If I am correct when we use epoch_evaluator it uses complete validation dataset and tests it on an intermediate model but I am seeing only the first sample image of validation dataset inside box folder of log folder.

Will be debugging more to find out what the exact issue is inside MultiprocessIterator.

Bartzi commented 6 years ago

I think I know what causes your problem with the training being stuck after one epoch! It is the setting of the iterator. The validation iterator is created in this line. But this way of creating it is wrong. It should be

validation_iterator = chainer.iterators.MultiprocessIterator(validation_dataset, args.batch_size, repeat=False)

Then it should not idle anymore. This idling does not occur when using the fast validator, as this validator only runs max. 200 iterations and stops then. It does not exhaust the iterator completely.

Your questions:

  1. Your model is stored in the logs subfolder, where each train runs creates a new subfolder that includes the date of the experiment and the name you gave to it. The model will be stored in this folder. A snapshot is taken every n iterations with n being controlled by the command line parameter -si or --snapshot-interval. The default is 5000
  2. Once one epoch is passed you know that the model has seen each training example once. The code adds an entry to the log file after n iterations, with n being controlled by the command line parameter -li or --log-interval. The default value is 100. So if you only have 300 training examples the first epoch will already be finished when the log is created for the first time.
  3. You are only seing the first example of the validation set in the bbox folder because this is the sample that is used to show the progress of the model at the current iteration. This is only done for one example, because it otherweise would make the training very slow.
mit456 commented 6 years ago

Thanks for the fix, it works. I have raised a PR please have a look.

Bartzi commented 6 years ago

already merged :wink:

mit456 commented 6 years ago

Hello @Bartzi,

Sorry for disturbing again, but I have a question regarding training on a custom dataset that I have. Can we give support for \n and \t into our custom_char_map.json because if there are multiple lines of text written on image then don't we need \n in train.csv and if there are multiple lines of text written in different positions, don't we need \t in that case? what is your opinion on this?

Bartzi commented 6 years ago

Sure, adding those values to the char_map should not be a problem, but I think it will be difficult for the model to predict a tabulator. But I don't see the point in using \t for multiple lines of text in different positions. Each line of text should have its own localization result, hence you should not need \t for this. At least the way I understand what you want to do right now.

\n might make sense, but we've discovered that it might not work to detect a region of text with multiple lines, while doing our experiments on FSNS, that is why we resorted to use single words instead of lines of text. And I think that this is also a problem for other datasets as well.

mit456 commented 6 years ago

One doubt:

Max no of words in an image, let's say 6 and maximum length of a word lets say 21 which we write in the first row of train.csv and validation.csv? we calculate this combining both the dataset not separately for both the dataset?

Bartzi commented 6 years ago

yes, you put this into both files as the first row. It should be the same as the model is optimized on especially those values and we need to know the layout of the data while training.

mit456 commented 6 years ago

What I meant is if the one of the validation sample has max no of words in an image than all other training samples, still we consider the no which we got from validation sample not the max of training samples?

Bartzi commented 6 years ago

You should always consider the max. number of words that you are training on, is that what you are asking for?

mit456 commented 6 years ago

Hello,

Thanks for your input. One problem I see on my custom data is all the time the prediction is similar to SVHN like 77, 35, 333 or so, it's never predicting the alphabets. I am trying to debug why but can you guess where could be the issue? I am using train_text_recognition.py.

Bartzi commented 6 years ago

maybe an inbalance in your training data? You have way more numbers than letters? Something is wrong with your labels? They are not aligned correctly? You do not correctly decode the output of the text recognition stage?

mit456 commented 6 years ago

My training data mostly consists of letters because those are taken from the nltk corpus sentences. I have not changed anything in decoding in text recognition stage, if you don't mind would you like to take a look at my train.csv and validation.csv?

mit456 commented 6 years ago

Attaching train.csv and validation.csv after converting to xlsx format.

train.xlsx validation.xlsx

Bartzi commented 6 years ago

hmm, those files alone do not help much, which char_map are you using?

mit456 commented 6 years ago

Here is mychar_map. Instead of using show_progress.py, I used chainerui which works well, Should I update README.md on how to use chainerui?

{
    "0": 9250,
    "1": 108,
    "2": 8216,
    "3": 233,
    "4": 116,
    "5": 101,
    "6": 105,
    "7": 110,
    "8": 115,
    "9": 120,
    "10": 103,
    "11": 117,
    "12": 111,
    "13": 49,
    "14": 56,
    "15": 55,
    "16": 48,
    "17": 8212,
    "18": 46,
    "19": 112,
    "20": 97,
    "21": 114,
    "22": 232,
    "23": 100,
    "24": 99,
    "25": 86,
    "26": 118,
    "27": 98,
    "28": 109,
    "29": 41,
    "30": 67,
    "31": 122,
    "32": 83,
    "33": 121,
    "34": 44,
    "35": 107,
    "36": 201,
    "37": 65,
    "38": 104,
    "39": 69,
    "40": 187,
    "41": 68,
    "42": 47,
    "43": 72,
    "44": 77,
    "45": 40,
    "46": 71,
    "47": 80,
    "48": 231,
    "49": 82,
    "50": 102,
    "51": 8221,
    "52": 50,
    "53": 106,
    "54": 124,
    "55": 78,
    "56": 54,
    "57": 176,
    "58": 53,
    "59": 84,
    "60": 79,
    "61": 85,
    "62": 51,
    "63": 37,
    "64": 57,
    "65": 113,
    "66": 90,
    "67": 66,
    "68": 75,
    "69": 119,
    "70": 87,
    "71": 58,
    "72": 52,
    "73": 76,
    "74": 70,
    "75": 93,
    "76": 239,
    "77": 73,
    "78": 74,
    "79": 228,
    "80": 238,
    "81": 59,
    "82": 224,
    "83": 234,
    "84": 88,
    "85": 252,
    "86": 89,
    "87": 244,
    "88": 61,
    "89": 43,
    "90": 92,
    "91": 123,
    "92": 125,
    "93": 95,
    "94": 81,
    "95": 339,
    "96": 241,
    "97": 42,
    "98": 33,
    "99": 220,
    "100": 226,
    "101": 199,
    "102": 338,
    "103": 251,
    "104": 63,
    "105": 36,
    "106": 235,
    "107": 171,
    "108": 8364,
    "109": 38,
    "110": 60,
    "111": 230,
    "112": 35,
    "113": 174,
    "114": 194,
    "115": 200,
    "116": 62,
    "117": 91,
    "118": 198,
    "119": 249,
    "120": 206,
    "121": 212,
    "122": 255,
    "123": 192,
    "124": 202,
    "125": 64,
    "126": 207,
    "127": 169,
    "128": 203,
    "129": 217,
    "130": 163,
    "131": 376,
    "132": 219,
    "133": 32,
    "134": 39,
    "135": 45
}
Bartzi commented 6 years ago

I think your groundtruth files are wrong.

Your first line states that each images has a maximum of 18 textual rois and each roi has at most 17 characters. All I can see is one text line with an arbitrary amount of characters. Did you forget to pad a text line with the blank label if does not have 17 characters?

ChainerUI is also nice, feel free to provide a PR with instructions on how to use ChainerUI =)

mit456 commented 6 years ago

Thanks for clarifying, I have considered one ROI as one word and a word(ROI) has at most 17 characters; is this correct assumption?

Bartzi commented 6 years ago

Yes, that is correct.

mit456 commented 6 years ago

So if I consider the following image from my training set, considering maximum 17 textual rois and has at most 17 characters in each ROI then there should be 289 columns of char_map excluding image_path column

According to my logic, this should be the row of the train.csv.

41 12 7 134 4 133 10 12 133 38 11 7 4 6 7 10 133 50 12 21 133 12 7 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

91

I have even padded blank i.e 0 after sentence ending, what am I doing wrong in data building please correct me.

Bartzi commented 6 years ago

The problem here is that you consider one roi to be more than one text line. If you really consider one ROI to be one word, it should look like this:

41 12 7 134 4 0 0 0 0 0 0 0 0 0 0 0 0 10 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 38 11 7 4 6 7 10 0 0 0 0 0 0 0 0 0 0 50 12 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 7 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

One ROI has 17 labels. If there are less than 17 characters in a word, we pad it with zero. That's what I understodd from your description.

If you really want to have multiple lines of text as one ROI, this seems to be correct, though. But how dows it look like for more than one ROI, in this case?

mit456 commented 6 years ago

Sorry, I mean I tried converting labels to the way you suggested but still no luck, really no idea what could be the issue. will open the issue if I find something till then closing it.

mit456 commented 6 years ago

BTW, thanks for clearing my concepts.

mit456 commented 6 years ago

Hello @Bartzi,

Tried debugging why the prediction is always letter not character, couple of things I noticed in models/text_recognition.py.

  1. we are not passing num_labels here as we are doing in case FSNSRecognitionNet ?

  2. Why is this comment here?

Can this be the issue, just a guess?

Bartzi commented 6 years ago
  1. you are right, this should be solved now! I'm sorry for my (kinda bad) code^^
  2. This comment states what happens: At every timestep the LSTM is expected to predict the bounding box for a single character. But this is just one way of thinking about it, that always depends on the semantic of the goal you have in mind while training the model.
mit456 commented 6 years ago

I believe that itself won't fix the problem, anyway I will be looking deeper and if I find a solution, will come back to you.

mit456 commented 6 years ago

Hi,

Initially, the prediction was like this. 10

After making some changes in localization and recognition network it improved I mean it started predicting labels for all timesteps like this, 10

but still could not figure out why the recognition network is not predicting chars. Anything you can think of @Bartzi ?

mit456 commented 6 years ago

Another interesting thing training using train_fsns gives character prediction without changing any code of fsns network.

8

Bartzi commented 6 years ago

hmm, that is really interesting, its especially interesting, that you only have one predicted bbox in your second last example. Are the other bboxes still below that one bbox?

How many iterations have you been into the training for those images?

My guess is that it has to do with the way the loss is calculated. Did you change anything in your groundtruth before using train_fsns.py?

mit456 commented 6 years ago
  1. Only single bbox started appearing after the changes made in models/ic_stn.py. Throughout the training there is only bounding box.

  2. 273 training images, 30 validation images, 10 epochs, 543 iterations and 5 is the batch_size. Note: All images are same.

  3. No, I did not change groundtruth.

FYI:

output from train_text_recognition:

epoch       iteration   main/loss   main/accuracy  lr          fast_validation/main/loss  fast_validation/main/accuracy  validation/main/loss  validation/main/accuracy
1           100         1.51389     0              0.00308566                                                            1.23965               0                         
3           200         0.765335    0              0.00425853                                                            0.912378              0                         
5           300         0.719538    0              0.00509208                                                            0.716425              0                         
7           400         0.794102    0              0.00574294                                                            3.84568               0                         
9           500         0.727743    0              0.00627392                                                            0.718997              0            

output from train_fsns

epoch       iteration   main/loss   main/accuracy  lr          fast_validation/main/loss  fast_validation/main/accuracy  validation/main/loss  validation/main/accuracy
1           100         3.70859     0.225          0.00308566                                                            2.66383               0                         
3           200         2.71601     0.345          0.00425853                                                            2.36127               0.5                       
5           300         2.32959     0.415          0.00509208                                                            2.32511               0.5                       
7           400         2.55705     0.41           0.00574294                                                            2.34061               0.25                      
9           500         2.39506     0.355          0.00627392                                                            2.33157               0.5 

Well, I will get deeper into loss calculation and check, thanks for the clue.

Bartzi commented 6 years ago

Yes, I think it is a mixture of loss calculation and the fact that you only have only one bbox. Is it intended to have only one bbox?

mit456 commented 6 years ago

Hello,

I am able to train by train_text_recognition using text ground truth but it seems model is not converging, main accuracy is always zero.

Training size: 700 images. Validation size: 200 images

Attaching loss graph. screenshot from 2018-07-11 11-48-24

What is your take on this?? It also seems that the localization network is also overfitting, Here are some bounding boxes on validation set [0]

iteration 1500 1530

iteration 7800 7810

iteration 12500 12500

iteration 16830 16830

iteration 24400 24420

Bartzi commented 6 years ago

Hmm, the visual backprop visualization only shows a black image, that is not a good sign. This either means that your network is not able to learn anything meaningful, or that the network diverged. Your loss graph backs the first assumption. One of the problems definitely is, that you only have 700 images, that is just not enough! You could try to boost this number by using image augmentation. Other than that: Have you tried to start with easier samples (i.e. samples where the text is rather in the center of the image than scattered around the image?) Have you checked that your groundtruth alignment is correct? (This really has a huge impact on performance) Are you sure that the semantics of the data in the network are correct?

These are the possible reasons I can think of right now...

mit456 commented 6 years ago

Thanks for prompt reply. I tested with the easier samples and it was working, then I made more sophisticated data.

  1. How do I confirm whether my ground truth alignment is correct or not I feel this could be the issue? Here is one example of my train.csv
    /home/abc/Projects/see/custom3/training/260.jpeg      Assembly session brought much good
  2. The semantics of the data I checked by printing labels and filename inside TextRecFileDataset class, it is printing correctly.

How do I print the labels that are being passed to the network?

Bartzi commented 6 years ago
  1. Your groundtruth looks good so far
  2. With sematics I mean that the content of the tensors makes sense. Check that each axis of your data tensor is handled in the right way. The recognition network for instance increases the initial batch size. And this increase is saved in the first axis, but the semantic is actually that the first axis consists of two axes. I hope this makes sense to you, all the rest should be handled with the same care. You can check whether everything is correct by attaching a debugger and have a look at shapes and data during the forward pass of the model (the beauty of chainer is, that you can do that without any problems!)

I hope this helps ;)