Closed yash-bhat closed 4 years ago
Hi,
thanks for using our work!
Here is the explanation of the command line options:
--load-localization
with this option you can indicate that you only want to load the weights of the pre-trained localization network that is contained in the snapshot, you are referring to with the -r
option--load-recognition
is the same, but here you will only load the weights of the recognition network--freeze-localization
indicates that the localization network shall not be updated during training, i.e. the weights remain constant through the training.For your case, you should use --load-localization
and --freeze-localization
.
Thanks for the explanation. I was able to train through till 400 iter but now it throws me the issue #18
Any ideas? I'm feeding single word images and the maximum length of the words are 10 which I have specified as below in the input data csv.
10 1
Error :
chainer.utils.type_check.InvalidType:
Invalid operation is performed in: SoftmaxCrossEntropy (Forward)
Expect: in_types[0].shape[0] == in_types[1].shape[0]
Actual: 1600 != 2080
Do you have a full traceback?
Hello @Bartzi
I figured the issue. The predicted and supplied labels were of different lengths. The problem is with python parsing I think. Whenever it encounters double quotes (") it wraps around it. For example: 10 foot 2 inches represented as string - 10' 2" when read from CSV in python it becomes --- "10' 2""" !!!
I have eliminated the double quotes altogether but ideally I need the model to learn the - " Thanks though. Will update here if I figure it out.
Just an update before I close this issue. For python string parsing issue with quotes - " ... i just went ahead with a placeholder character which I clean up during inference.
Hello @Bartzi
Great work!!
I am training a text recognition model based off your 'text_recogniton_model' as such:
python3 chainer/train_text_recognition.py curriculum.json log -b 16 --char-map map.json -r text_recog_model/model_190000.npz -g 0 -si 100
Can you please give a brief as to what --load-localization, --load-recognition and --freeze-localization actually does.
My goal is to use your model as is for text localization but fine tune text (new set of characters) recognition for my cause.
Thank you!