Open youngsirsk opened 3 years ago
You need to check this part.
train.py line 25 print('Filtering the images containing characters which are not in opt.character') print('Filtering the images whose label is longer than opt.batch_max_length')
dataset.py line 168
line 190
When I create lmdb dataset with custom data(e.g. with trdg create multi words in per image), I followed the guide to create dataset, and I trained with below code:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py \ --exp_name CRNN_CTC_demo \ --select_data / \ --batch_ratio 1 \ --train_data lmdb_dataset/train/ --valid_data lmdb_dataset/val/ \ --Transformation TPS --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC
these are the print on the screen:------ Use multi-GPU setting ------ if you stuck too long time with multi-GPU setting, try to set --workers 0 Filtering the images containing characters which are not in opt.character Filtering the images whose label is longer than opt.batch_max_length
dataset_root: lmdb_dataset/train/ opt.select_data: ['/'] opt.batch_ratio: ['1']
dataset_root: lmdb_dataset/train/ dataset: / sub-directory: /. num samples: 0 num total samples of /: 0 x 1.0 (total_data_usage_ratio) = 0 num samples of / per batch: 768 x 1.0 (batch_ratio) = 768 Traceback (most recent call last): File "train.py", line 317, in
train(opt)
File "train.py", line 31, in train
train_dataset = Batch_Balanced_Dataset(opt)
File "/home/yxy/deep-text-recognition-benchmark/dataset.py", line 67, in init
collate_fn=_AlignCollate, pin_memory=True)
File "/home/yxy/anaconda3/envs/crnn_ctc/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 262, in init
sampler = RandomSampler(dataset, generator=generator) # type: ignore
File "/home/yxy/anaconda3/envs/crnn_ctc/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 104, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0`
It shows there is a error when loading data, I have no ideas why and how it happend. when I trained with one word in a image, it works well. Works: A word, for example "code". Bug: Multiple words, for example "I love coding"
By the way, I found there are some special characters between word, which will occur the same error.(e.g. a_b_c_d_e)
I have no ideas how to fix the problem. Can anyone help me? Thanks a lot!!!