clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.75k stars 1.1k forks source link

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType' #204

Closed Aldemaro14 closed 4 years ago

Aldemaro14 commented 4 years ago

Hello good people, I'm trying to train this model with my own data, after some issues, now i got the following when running the following:

`(venv) C:\Users\itres\Desktop\OCR\craft_crnn\deep-text-recognition-benchmark>python train.py --train_data ../result --valid_data ../result_val --Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC Filtering the images containing characters which are not in opt.character Filtering the images whose label is longer than opt.batch_max_length

dataset_root: ../result opt.select_data: ['/'] opt.batch_ratio: ['1']

dataset_root: ../result dataset: / None Traceback (most recent call last): File "train.py", line 304, in train(opt) File "train.py", line 31, in train train_dataset = Batch_Balanced_Dataset(opt) File "C:\Users\itres\Desktop\OCR\craft_crnn\deep-text-recognition-benchmark\dataset.py", line 42, in init _dataset, _dataset_log = hierarchical_dataset(root=opt.train_data, opt=opt, select_data=[selected_d]) File "C:\Users\itres\Desktop\OCR\craft_crnn\deep-text-recognition-benchmark\dataset.py", line 118, in hierarchical_dataset dataset = LmdbDataset(dirpath, opt) File "C:\Users\itres\Desktop\OCR\craft_crnn\deep-text-recognition-benchmark\dataset.py", line 143, in init nSamples = int(txn.get('num-samples'.encode())) TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'`

also i was able to generate the lmdb dataset

image

Aldemaro14 commented 4 years ago

https://github.com/clovaai/deep-text-recognition-benchmark/issues/172#issuecomment-639380782

Also i did followed this advice and got another issue

(venv) C:\Users\itres\Desktop\OCR\craft_crnn\deep-text-recognition-benchmark>python create_lmdb_dataset.py --inputPath ../data/training_data --gtFile ../data/training_data/gt.txt --outputPath result/ Traceback (most recent call last): File "create_lmdb_dataset.py", line 89, in <module> fire.Fire(createDataset) File "C:\Users\itres\Desktop\OCR\craft_crnn\venv\lib\site-packages\fire\core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\itres\Desktop\OCR\craft_crnn\venv\lib\site-packages\fire\core.py", line 463, in _Fire component, remaining_args = _CallAndUpdateTrace( File "C:\Users\itres\Desktop\OCR\craft_crnn\venv\lib\site-packages\fire\core.py", line 672, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "create_lmdb_dataset.py", line 49, in createDataset imagePath, label = datalist[i].strip('\n').split('\t', 1) ValueError: not enough values to unpack (expected 2, got 1)

it worked properly with space just by changing: from this imagePath, label = datalist[i].strip('\n').split('\t') to this imagePath, label = datalist[i].strip('\n').split(' ', 1)

Aldemaro14 commented 4 years ago

Fixed.

1st- you can train the model using PNG+space instead of PNG+tab, just use the code abobe.

2nd- the issue was that I was pointing to the wrong folder.....