Open ffiruzi opened 2 years ago
Duplicate of #356 Hello I encountered this error when I wanted retrain the network using my dataset, which is in Farsi language. Can anyone help me, what is the reason for this and I used the trainer file for retraining? But when I do the same way with the Latin dataset, it done, I should change some values for Persian language or non-Latin languages, maybe this is the reason for this error? thanks for help me
Hi, I have a question: is there any minimum number for the length of the character set? when I execute the training file with the character set 'آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهیءأئةؤ ' , for Persian letter with a length of 40, training starts without any problem. but when I change the character set to '۰۱۲۳۴۵۶۷۸۹' for the train a model for Persian digits recognition with the length of 10, I get the following error:
python train.py --train_data D:/lmdb/digits/result_train/ --valid_data D:/lmdb/digits/result_test/ --data_filtering_off --imgH 64 --imgW 200 --batch_size 8 --workers 0 --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn
exp_name: TPS-ResNet-BiLSTM-Attn-Seed1111 train_data: D:/lmdb/digits/result_train/ valid_data: D:/lmdb/digits/result_test/ manualSeed: 1111 workers: 0 batch_size: 8 num_iter: 300000 valInterval: 2000 saved_model: FT: False adam: False lr: 1 beta1: 0.9 rho: 0.95 eps: 1e-08 grad_clip: 5 baiduCTC: False select_data: ['/'] batch_ratio: ['1'] total_data_usage_ratio: 1.0 batch_max_length: 23 imgH: 64 imgW: 200 rgb: False character: ۰۱۲۳۴۵۶۷۸۹ sensitive: False PAD: False data_filtering_off: True Transformation: TPS FeatureExtraction: ResNet SequenceModeling: BiLSTM Prediction: Attn num_fiducial: 20 input_channel: 1 output_channel: 512 hidden_size: 256 num_gpu: 1 num_class: 12
Traceback (most recent call last): File "train.py", line 317, in
train(opt)
File "train.py", line 181, in train
valid_loss, current_accuracy, current_norm_ED, preds, confidence_score, labels, infer_time, length_of_data = validation(
File "C:\Users\kian\Desktop\tps\deep-text-recognition-benchmark-master\test.py", line 130, in validation
preds_str = converter.decode(preds_index, length_for_pred)
File "C:\Users\kian\Desktop\tps\deep-text-recognition-benchmark-master\utils.py", line 144, in decode
text = ''.join([self.character[i] for i in text_index[index, :]])
File "C:\Users\kian\Desktop\tps\deep-text-recognition-benchmark-master\utils.py", line 144, in
text = ''.join([self.character[i] for i in text_index[index, :]])
IndexError: list index out of range
An exciting thing for me is that if I add some spaces to the character set for example if I use
"۰۱۲۳۴۵۶۷۸۹---------------------------------"
instead of
'۰۱۲۳۴۵۶۷۸۹'
the error fixes. I would like to know what is the problem with a character set with a length of 10 for training a model? Is there any minimum number for the length of character for starting training these models?Thanks for your help.