Closed kspook closed 5 years ago
@kspook Everything works fine using tensorflow 1.12 with gtx1070 CUDA 9.0 in my local machine. Perhaps you need to raise an issue under tensorflow if you use the same version tensorflow:)
@MaybeShewill-CV , it's write_tf_records.py issue. I used 49 characters with numbers, alphabet, Korean. but in char_dict.json there are 10 line,in ord_map.json there are 20 line. There are 0 byte tfrecords. you can check here. https://drive.google.com/open?id=1TpfrQpi6h7cn1cH-y8NOmTXjrOH2DbtV Korean images : image-data/hangul-images/
I0704 21:46:55.333094 6091 shadownet_data_feed_pipline.py:159] Start initialize train sample information list...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 162/162 [00:00<00:00, 166293.99it/s]
I0704 21:46:55.339854 6091 shadownet_data_feed_pipline.py:174] Start initialize validation sample information list...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 86323.45it/s]
I0704 21:46:55.340504 6091 shadownet_data_feed_pipline.py:188] Start initialize testing sample information list...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 78012.22it/s]
I0704 21:46:55.341225 6091 shadownet_data_feed_pipline.py:212] Char set length: 12
I0704 21:46:55.341894 6091 shadownet_data_feed_pipline.py:219] Write char dict map complete
I0704 21:46:55.341991 6091 shadownet_data_feed_pipline.py:83] Generating training sample tfrecords...
I0704 21:46:55.342212 6091 tf_io_pipline_fast_tools.py:449] Start filling train dataset sample information queue...
0%| | 0/162 [00:00<?, ?it/s]E0704 21:46:55.342514 6091 tf_io_pipline_fast_tools.py:462] Lexicon doesn't contain lexicon index 54616
E0704 21:46:55.342586 6091 tf_io_pipline_fast_tools.py:462] Lexicon doesn't contain lexicon index 54616
E0704 21:46:55.342641 6091 tf_io_pipline_fast_tools.py:462] Lexicon doesn't contain lexicon index 45208
I think I couldn't make lexicon.txt in the right way. Do you know how to fix?
@kspook You may check the way in which the Synth90k dataset is orgnized and the two json file will automatically generated during making tensorflow records:)
@MaybeShewill-CV, I did. two files char_dict.json, ord_map.json were made automatically.
My question is that tf_io_pipeline_fast_tools.py can't handle lexcon.txt even though I made the same style as Syn90k dataset.
@kspook If you met error when testing the tools on Synth90k dataset you may put error information here. If nothing happened when you training synth90k dataset but met error during the training process on your own dataset please check your dataset's file yourself. There must be something wrong with your label file:)
@MaybeShewill-CV, I think I have the same situation as #285. but I have the above error without index in the line.
If I put the index, I have naturally an error.
I0708 03:45:20.357513 12066 shadownet_data_feed_pipline.py:159] Start initialize train sample information list...
0%| | 0/162 [00:00<?, ?it/s]
Traceback (most recent call last):
File "tools/write_tfrecords.py", line 74, in <module>
save_dir=args.save_dir
File "tools/write_tfrecords.py", line 56, in write_tfrecords
writer_process_nums=8
File "/data/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 64, in __init__
self._init_dataset_sample_info()
File "/data/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 164, in _init_dataset_sample_info
image_name, label_index = line.rstrip('\r').rstrip('\n').split(' ')
ValueError: too many values to unpack (expected 2)
@MaybeShewill-CV, can you upload chinese data on Google. I can't download it and check.
@kspook Check your dataset's format according to Synth90k dataset:)
I did. the answer must be #302. lack of data. but in my case, tfrecords were not perfect, due to lexicon index errors.
can you check my file? https://drive.google.com/open?id=1k0qsklB8Y1IbMUBOurnTKTEhUUxw_pwK
I don't think it is different from syn90k.
I am also interested in how to make file in Chinese. Unlike English, Chinese was converted to numbers. How did you make Chinese words? How can you identify two characters?
according to this, https://github.com/MaybeShewill-CV/CRNN_Tensorflow/issues/285#issuecomment-505333966, a chinese word looks to have one index(number). Am I right?
@kspook Maybe you could test if the problem still exist after you enlarging your dataset:)
@MaybeShewill-CV , I manged to use syn90k. can I just ignore for 'PREMATURE END OF IMAGE' ?
there was posting before #112, but no exact answer.
I did with syn90k. I am interested in Chinese and Korean. In case chinese, your replaced every chinese character into number. How did you manage for more than two characters? How can this script identify two numbers in the two-character words?
i.e chinese 1, chinese 2 --> one word ord(chinese1), ord(chinese2) --> 50000,5002 how did you transfer them to one word?
I did. the answer must be #302. lack of data. but in my case, tfrecords were not perfect, due to lexicon index errors. can you check my file? https://drive.google.com/open?id=1k0qsklB8Y1IbMUBOurnTKTEhUUxw_pwK I don't think it is different from syn90k.
I am also interested in how to make file in Chinese. Unlike English, Chinese was converted to numbers. How did you make Chinese words? How can you identify two characters?
according to this, #285 (comment), a chinese word looks to have one index(number). Am I right?
@kspook
@MaybeShewill-CV , I manged to use syn90k. can I just ignore for 'PREMATURE END OF IMAGE' ?
there was posting before #112, but no exact answer.
Your image file is not complete or not valid
I did with syn90k. I am interested in Chinese and Korean. In case chinese, your replaced every chinese character into number. How did you manage for more than two characters? How can this script identify two numbers in the two-character words?
i.e chinese 1, chinese 2 --> one word ord(chinese1), ord(chinese2) --> 50000,5002 how did you transfer them to one word?
I did. the answer must be #302. lack of data. but in my case, tfrecords were not perfect, due to lexicon index errors. can you check my file? https://drive.google.com/open?id=1k0qsklB8Y1IbMUBOurnTKTEhUUxw_pwK I don't think it is different from syn90k. I am also interested in how to make file in Chinese. Unlike English, Chinese was converted to numbers. How did you make Chinese words? How can you identify two characters? according to this, #285 (comment), a chinese word looks to have one index(number). Am I right?
I did not transform them into one word. You may probably misunderstand the model:)
then what is ur understanding?
how did u make chinese lexicon and labels file?
this model understands ascii values, right? did you use Chinese for lexicon and labels?
@kspook Yep, I've trained Chinese model and posted it here:)
@MaybeShewill-CV, you got me wrong.
If i put Korean Character in annotation_train.txt, then I have this error. So, my question is how you dealt with Chinese character. I thought you transfromed Chinese character to numbers.
Traceback (most recent call last):
File "tools/write_tfrecords.py", line 74, in <module>
save_dir=args.save_dir
File "tools/write_tfrecords.py", line 56, in write_tfrecords
writer_process_nums=8
File "/data/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 64, in __init__
self._init_dataset_sample_info()
File "/data/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 166, in _init_dataset_sample_info
label_index = int(label_index)
ValueError: invalid literal for int() with base 10: '\ud558'
@kspook No matter English characters or chinese characters they share the same way of generating tensorflow records:)
@MaybeShewill-CV, thank you.
Recently, I wasn't in the situation to download syn90k for long time. So, I used wrong information with old data. Finally I could download file again , and I found the problem when I made tfrecords. Now I can train Korean data.
@kspook ok :)
@kspook I also encountered this problem.
Traceback (most recent call last): File "tools/write_tfrecords.py", line 74, in <module> save_dir=args.save_dir File "tools/write_tfrecords.py", line 56, in write_tfrecords writer_process_nums=8 File "/data/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 64, in __init__ self._init_dataset_sample_info() File "/data/home/kspook/CRNN_Tensorflow/data_provider/shadownet_data_feed_pipline.py", line 166, in _init_dataset_sample_info label_index = int(label_index) ValueError: invalid literal for int() with base 10: '\ud558'
How did you solve it
@MaybeShewill-CV, it's not nvidia-smi at #295
cuda9.0 installed successfully.
cudnn 7 installed successfully
The error still occurred.