Open yxgnahz opened 2 years ago
Hi, @yxgnahz
Is the training data the same with that released by clovaai (https://github.com/clovaai/deep-text-recognition-benchmark)? I use their released one, but can only get 85.2% by pretraining the vision model.
I probably found the reason. I used the data from clovaai (https://github.com/clovaai/deep-text-recognition-benchmark), however, and I found there are about 5M images in ST while in this repo there are more than 6M images in ST. Using the data from clovaai can only get on average 85% for pretraining the vision model.
Hi, @ccx1997 @yxgnahz
@FangShancheng using your conversion tool (crop_by_word_bb_syn90k.py
), I only get 5,295,444 valid samples from SynthText. 192,708 samples generated errors and were rejected by your script. Based on @yxgnahz's comment, your ST archive contains more than 6M images. Did you use the same script?
Meanwhile, clovaai's ST archive contains 5,522,807 images.
Hi, @baudm we re-check this script (crop_by_word_bb_syn90k.py
) and find that the discrepancy, which will filter the text that originally contains the special token.
Change code in line 59:
if len_now - len(txt_temp) != 0:
# print('txt_temp-2-', txt_temp)
continue
to
if len_now - len(txt_temp) != 0:
print('txt_temp-2-', txt_temp)
We will update the script later. Thanks for reminding and hope your feedback too.
Thanks, @FangShancheng. The script now generates more samples.
For completeness, and for everyone else generating the data from scratch, here's a comparison between the ClovaAI data and my generated data (using the scripts here) in terms of number of samples: | Dataset | ClovaAI | Generated |
---|---|---|---|
MJ_train |
7,224,586 | 7,224,600 | |
MJ_test |
891,924 | 891,924 | |
MJ_val |
802,731 | 802,733 | |
SynthText |
5,522,807 | 7,003,173 |
I don't know why there's a discrepancy in the MJSynth
samples since no processing is being done there and both projects use the exact same script.
Thanks, @FangShancheng. The script now generates more samples.
For completeness, and for everyone else generating the data from scratch, here's a comparison between the ClovaAI data and my generated data (using the scripts here) in terms of number of samples:
Dataset ClovaAI Generated
MJ_train
7,224,586 7,224,600MJ_test
891,924 891,924MJ_val
802,731 802,733SynthText
5,522,807 7,003,173 I don't know why there's a discrepancy in theMJSynth
samples since no processing is being done there and both projects use the exact same script.
@baudm Good job.
create_lmdb_dataset.py
, which will also filter some invalid images.Hi, @baudm , we now provide a mirror of the dataset, which does not need an account to download.
MJ(https://rec.ustc.edu.cn/share/578cfbf0-fc5b-11eb-b3eb-d38a253722d6) ST(https://rec.ustc.edu.cn/share/69402a20-fc5b-11eb-8d52-7d4a03b38119)
@baudm Good job.
1. After checking the released data that we used in training our models, the images are 6976115 for SynthText dataset in the LMDB, which is less than 7,003,173 and we indeed using the same crop script. 2. The MJSynth dataset provides cropped images, so how do you using crop script to get 7,224,600 images? 3. One reason about the discrepancy in MJSynth between ClovaAI and your generated images, and between our released LMDB dataset and your generated images might exist in `create_lmdb_dataset.py`, which will also filter some invalid images.
Thanks. Upon further checking, it seems like the ClovaAI MJSynth archive is correct. I modified create_lmdb_dataset.py
to use PIL.Image
for checking image validity. cv2.imdecode()
seems to read the image headers, but doesn't actually decode the image contents, hence a few corrupted images were missed. After the modification, I got the exact same number of samples as in the ClovaAI archives.
Thanks for the archive mirrors!
Update: I also reproduced your ST dataset by filtering out samples which doesn't contain any alphanumeric labels. Final count is also 6,976,115.
@FangShancheng Hi,thanks for your work! I used your provided dataset, pretrained models and config files to reproduce the experimental data in your paper.I got the result as follows:
Model | IC13 | SVT | IIIT | IC15 | SVTP | CUTE | AVG ABINet-SV | 97.1 | 92.7 | 95.2 | 84.0 | 86.7 | 88.5 | 91.4 ABINet-LV | 97.0 | 93.2 | 96.4 | 85.9 | 89.0 | 89.2 | 92.6
The result of you provided ABINet-LV pretrained mode are almost the same as in the paper,but the result of you provided ABINet-SV pretrained mode are substantially lower than those given in the paper.What is the reason for this? What further steps should I take if I want to reproduce the results given in the paper?
My environment as the follows: Python 3.7.2 , torch 1.4.0,
the results you given are almost the same as our provided models... from the statistics above ... . Do u miss something important? @HHeracles
the results you given are almost the same as our provided models... from the statistics above ... . Do u miss something important? @HHeracles
Everything I used is provided by you, including virtual environments, data sets, default configurations, pre-training models, etc.
the results you given are almost the same as our provided models... from the statistics above ... . Do u miss something important? @HHeracles
Everything I used is provided by you, including virtual environments, data sets, default configurations, pre-training models, etc.
So, what is your accuracy of ABINet-SV now? the reported accuracy of ABINet-SV is about 91.4. @HHeracles
the results you given are almost the same as our provided models... from the statistics above ... . Do u miss something important? @HHeracles
Everything I used is provided by you, including virtual environments, data sets, default configurations, pre-training models, etc.
So, what is your accuracy of ABINet-SV now? the reported accuracy of ABINet-SV is about 91.4. @HHeracles
The reported accuracy of ABINet-SV is about 90.2, and the accuracy of ABINet-LV is about 92.6.The average of ABINet-SV in the table above is a clerical error. Sorry.
the results you given are almost the same as our provided models... from the statistics above ... . Do u miss something important? @HHeracles
Everything I used is provided by you, including virtual environments, data sets, default configurations, pre-training models, etc.
So, what is your accuracy of ABINet-SV now? the reported accuracy of ABINet-SV is about 91.4. @HHeracles
The reported accuracy of ABINet-SV is about 90.2, and the accuracy of ABINet-LV is about 92.6.The average of ABINet-SV in the table above is a clerical error. Sorry.
Do you mean that you obtain only 90.2 accuracy for ABINet-SV, and actually the reported accuracy of released/reported models is about 91.4? How about your training time, and could you give your training log for further checking?
the results you given are almost the same as our provided models... from the statistics above ... . Do u miss something important? @HHeracles
Everything I used is provided by you, including virtual environments, data sets, default configurations, pre-training models, etc.
So, what is your accuracy of ABINet-SV now? the reported accuracy of ABINet-SV is about 91.4. @HHeracles
The reported accuracy of ABINet-SV is about 90.2, and the accuracy of ABINet-LV is about 92.6.The average of ABINet-SV in the table above is a clerical error. Sorry.
Do you mean that you obtain only 90.2 accuracy for ABINet-SV, and actually the reported accuracy of released/reported models is about 91.4? How about your training time, and could you give your training log for further checking?
Yes. I did not do any training, and just used the pretrained model you provided, that is the best-pretrain-vision-model.pth form https://pan.baidu.com/share/init?surl=b3vyvPwvh_75FkPlp87czQ.
the results you given are almost the same as our provided models... from the statistics above ... . Do u miss something important? @HHeracles
Everything I used is provided by you, including virtual environments, data sets, default configurations, pre-training models, etc.
So, what is your accuracy of ABINet-SV now? the reported accuracy of ABINet-SV is about 91.4. @HHeracles
The reported accuracy of ABINet-SV is about 90.2, and the accuracy of ABINet-LV is about 92.6.The average of ABINet-SV in the table above is a clerical error. Sorry.
Do you mean that you obtain only 90.2 accuracy for ABINet-SV, and actually the reported accuracy of released/reported models is about 91.4? How about your training time, and could you give your training log for further checking?
Thank you for your reply. I got the reason, I am loaded pretrain_vision_model.yaml not pretrain_vision_model_sv.yaml.
Hello, I ran the code directly using the setting pretrain_vision_model.yaml, here are the results of the trained model:
Benchmark | Accuracy | IC13 | 92.6 | SVT | 87.2 | IIIT | 88.1 | IC15 | 78.7 | SVTP | 81.4 | CUTE80 | 79.5 | Average | 85.0 |.
It seems that the released pretrained vision has an average accuracy of 90%, so could you please tell me do you use the pretrain_vision_model.yaml to pretrain the vision model, and if you have some additional tricks or data to train the vision model?