Belval / CRNN

A TensorFlow implementation of https://github.com/bgshih/crnn
MIT License
297 stars 101 forks source link

Problems with training Tibetan pictures #70

Open hsyy673150343 opened 4 years ago

hsyy673150343 commented 4 years ago

Do you still remember me? I spoke with you on your other open source project last time. This time I wanted to train with the pictures with Tibetan text that I generated last time, but I ran into a problem. I use the following command: --trdg -c 200000 -i dicts/zzwt_tibetan_sub_string.txt -ft fonts/latin/Qomolangma-UchenSarchung.ttf -t 8 --word_split

But the following error appears: Missing modules for handwritten text generation. 3%|████▎ | 5527/200000 [00:14<08:25, 384.98it/s] multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/hs/anaconda3/envs/data_generate/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/home/hs/anaconda3/envs/data_generate/lib/python3.6/site-packages/trdg/data_generator.py", line 21, in generate_from_tuple cls.generate(t) File "/home/hs/anaconda3/envs/data_generate/lib/python3.6/site-packages/trdg/data_generator.py", line 230, in generate final_image.convert("RGB").save(os.path.join(out_dir, image_name)) File "/home/hs/anaconda3/envs/data_generate/lib/python3.6/site-packages/PIL/Image.py", line 2099, in save fp = builtins.open(filename, "w+b") OSError: [Errno 36] File name too long: 'out/་ཡི་གེ\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 唇音\xa0\xa0 pa\xa0 pha\xa0 ba\xa0 bha\xa0\xa0 ma\xa0\xa0\xa0\xa0 སྒྲ་ཕྱེད་ཀྱི་ཡི་གེ།\0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 半元音ya\xa0 ra la\xa0\xa0 vaསྒྲ་མེད་ཀྱི་_4343.jpg' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/hs/anaconda3/envs/data_generate/bin/trdg", line 8, in sys.exit(main()) File "/home/hs/anaconda3/envs/data_generate/lib/python3.6/site-packages/trdg/run.py", line 414, in main total=args.count, File "/home/hs/anaconda3/envs/data_generate/lib/python3.6/site-packages/tqdm/std.py", line 1127, in iter for obj in iterable: File "/home/hs/anaconda3/envs/data_generate/lib/python3.6/multiprocessing/pool.py", line 699, in next raise value OSError: [Errno 36] File name too long: 'out/་ཡི་གེ\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 唇音\xa0\xa0 pa\xa0 pha\xa0 ba\xa0 bha\xa0\xa0 ma\xa0\xa0\xa0\xa0 སྒྲ་ཕྱེད་ཀྱི་ཡི་གེ།\0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 半元音ya\xa0 ra la\xa0\xa0 vaསྒྲ་མེད་ཀྱི་_4343.jpg'

Can this problem be solved by modifying the code? If yes, how can I modify it? Can you give me a simple guide?

Belval commented 4 years ago

Use -na 2 to have the labels written to another file.

hsyy673150343 commented 4 years ago

Use -na 2 to have the labels written to another file.

what`s the format of you training data?I saw in the issues that you said that the label format of the training set is [LABEL]_[NUMBER].[EXT].

Use -na 2 to have the labels written to another file.Can this also be used for the label format of the training set for this project?

Belval commented 4 years ago

You would have to edit the data manager to load the label file, but keep in mind that this project was built for Latin-based languages, and I do not if it will work at all with Tibetan. You would have to change the CHAR_VECTOR to match your characters.

Also, you could try and use the programmable API instead of pre-generated data for your needs. By editing this line: https://github.com/Belval/CRNN/blob/master/CRNN/data_manager.py#L50 to generate Tibetan data, you could avoid having pre-generating your data.