Open HFVladimir opened 6 years ago
Which pre-trained model are you using? Are you using the one from bartzi.de
?
Yes, I used a text_recognition_model
from bartzi.de
and only adapted the paths to pictures.
Did you have a look at the bboxes
directory that has been created in the log directory for this train run? How do these images look like (there should be some images if you did not change anything else)?
I looked at the pictures and how I see recognition gives nothing right from the start training.
This was the first and last picture from boxes
folder (10.png and 60.png in my case).
Result with picture from Synth-90k
dataset passed with --test-image
key:
I checked both images with this model using a script text_recognition_demo.py
and all characters recognized correctly.
Oh now I see the problem^^
Try to use -r ../model/model_190000.npz
instead of --model ../model/model_190000.npz
It should work then
Yes, its works. Thank you. But I think you should remove the parameter model
, because this is the first thing you pay attention to, but it does not do anything.
Now I've found a new way to get 0 accuracy^^
I've expanded the charmap with some characters ( in particular '%'), collected new dataset and I'm not happy with the accuracy localization part ( '% is divided into two boxes'). In the process of training my boxes almost do not move, and I decided that the parament --refinement
will help me, but after it was turned on all the boxes were gone. What this parameter actually does and how I can merge two boxes into one when recognizing '%'?
You are right! I removed that parameter, thx for the hint.
The --refinement
turns on transformation parameter refinement with inverse compositional spatial transformer networks.
I thought this could be used to increase the accuracy of the localization network, by iteratively refining the predictions of the localization network. Turns out, it does help a little bit, but memory and runtime cost are to high. So I suggest, that you do not use this parameter, unless you want to try that. If you are using this parameter you should also set --refinement-steps 2
.
But it is strange that all boxes were gone... did they disappear already in the first iteration, or did it take some time?
I tried with default --refinement-steps 1
and they did disappear already in the first iteration. I'll try with --refinement-steps 2
and report the results. Now I have a few questions:
1) I'm trying to continue learning the model from bartzi.de
with extended charset. Will it recognize the new symbols?
2) Can you explain me meaning of some parameters, in particular --zoom
, --optimize-all-interval
and factor parameters? Will they help me improve accuracy of the localization network?
Yes, you can use the model from bartzi.de
, I suggest you start the training without initializing the recognition network with the pre-trained model, using the parameter --load-localization
. The most important part is the localization net, if you are starting with a good init there, you will also get good results on the recognition net. Later you should do some finetuning with all parts of the network initialized by the model. That means that you can change the recognition part the way you like and then train using an already trained localization net.
Lets talk about parameters:
--zoom
is a float in the range 0-1 and determines the zoom rate of the uninitialized localization network at the start of training (try to set it to 0.5
and 0.9
and have a look at the images in the bboxes
folder for the first iteration)--optimize-all-intervall
is a switch that was originally used to test whether it makes sense to run optimize the parameters of the recognition network more often than the parameters of the localization network (turns out, it doesn't help). So this is another parameter that I should delete :sweat_smile: other interesting parameters:
--send-bboxes
turns on sending of bbox images. You can use this switch in conjunction with the show_progress.py
script --area-factor
determines the strength of the area loss regularizer, you can play around with this and encourage the localization net to predict smaller bboxes--area-scale-factor
this factor is used to determine how the area loss factor is changing over time (if used by the loss metric), also a bit legacy, but could be useful--aspect-factor
as area factor the weighting factor for adding the aspect ratio loss regularization to the overall lossHope that helps =)
Thanks, you helped a lot! If you dont mind i have one else question. I'm new to Chainer and I do not understand how replace last classifier layer on the pretrained model, cause i use new charmap and my label_size
is now 57. I used the following code
with numpy.load('path-to-model') as f:
for key,value in f.items():
if str(key) == 'recognition_net/classifier/b' or str(key) == 'recognition_net/classifier/W':
print('{} - {}'.format(key, len(str(value))))
to check the size of classifier layer and I expected to see 52 (default label_size
), but i got
recognition_net/classifier/b - 632
and recognition_net/classifier/W - 499
. How do I remove the last layer and what should be the new dimension for my case? Thanks =)
Looks alright so far, but there is an error in your code snippet:
with
print('{} - {}'.format(key, len(str(value))))
you are not printing the length of the array, but the length of the stringified version of the array, so if you are doing:
print('{} - {}'.format(key, len(value)))
you should get 52
as result for b
and also for W
:wink:
I encounter the same problem python train_text_recognition.py /home/dev2/see/datasets/vin/path.json /home/dev2/see/datasets/vin/logs --char-map ../datasets/fsns/fsns_char_map.json -b=64 -r ../datasets/model/model_190000.npz
btw : is this accuracy is tested on my new dataset? @Bartzi
@Bartzi Could somebody, please, provide a minimum example of flags to train text recognition on a new dataset and not get zero accuracies all the time? I am already tried various combinations and still do not get anything (despite the training loss decrease steadily, which is odd).
Yes I can give you an example:
python train_text_recognition.py <path to curriculum.json> -b 60 --blank-label 0 --char-map ../datasets/textrec/ctc_char_map.json --zoom 0.9 --area-factor 0.1 -lr 1e-4
and that should be it...
However, it might take some time until the accuracy goes up, although the loss is decreasing. It could also be that there is a bug in the accuracy calculation method.
Hi, I'm running the demo with pretrained model and following code
python train_text_recognition.py ../small_dataset/curriculum.json ./logs --char-map ../small_dataset/ctc_char_map.json -g=0 -b=64 --model ../model/model_190000.npz --epochs 5 -li 10 --use-serial-iterator -lr 0.000001 --lr-step 0
. Despite the small lr my accuracy is always 0, can anybody explain this to me?