MichalBusta / E2E-MLT

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
MIT License
291 stars 84 forks source link

bad image #30

Closed coorful closed 5 years ago

coorful commented 5 years ago

hello,i'm sorry to bother you when i run the train_ocr.py, it always show the bad image message, 搜狗截图20190428222431 the validation dataset i used is part of ic15 test word images,could you please tell me why this codition happens? thanks a lot!!!

coorful commented 5 years ago

@MichalBusta

MichalBusta commented 5 years ago

https://github.com/MichalBusta/E2E-MLT/blob/1fb74eab8411185261d59ed2467fa59e4bd95530/ocr_test_utils.py#L140

you can add: import sys, traceback traceback.print_exc(file=sys.stdout) after 'except:'

to see what is going on

coorful commented 5 years ago

Thank you so much for your reply,it seems that in ocr_test_utils.py 1 this fuction needs four outputs but just unpack three,when i add one output det_text, conf, decs, =print_seq_ext() the problem solves! thanks you ~

coorful commented 5 years ago

besides, i have one question to ask for your help~ can i just use the ocr model to train a seperate word recognition model(just to achieve recognition task).if i can do like this,how large dataset should i have?(i just want to test on icdar2015 word recognition dataset ) thank you ! @MichalBusta

coorful commented 5 years ago

Could you please give me some advice?Thanks a lot~ @MichalBusta

MichalBusta commented 5 years ago

Hi,

On 29/04/2019 06:08, cooooor wrote:

besides, i have one questions to ask for your help~ can i just use the ocr model to train a seperate word recognition model(just to achieve recognition task).

sure, there is https://github.com/MichalBusta/E2E-MLT/blob/master/train_ocr.py script just for training ocr module.

if i can do like this,how large dataset should i have?(i just want to test on icdar2015 word recognition dataset )

hard to say - depends on data - (sythetic images - VGG group have been using 9 million imagescovering90k English words for ICDAR 2013 dataset, real images ~ 100000 from icdar2015 and icdar2017 MLT will give you quite good baseline ... )

thank you ! @MichalBusta https://github.com/MichalBusta

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/E2E-MLT/issues/30#issuecomment-487446893, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7KHMBDMF7FKWEYNRYEKO3PSZYFLANCNFSM4HI6T3FA.

coorful commented 5 years ago

hello,when i use the dataset from IC15、IC17MLT&IC19MLT(only use the latin words,about 70000word images),and only run the train_ocr.py,but the accuracy on ic15 test word dataset can just achieve 56.3% accuracy ,the batchsize i used equals to 4,could you please give me some advice why the accuracy is so low,and what should i do to improve it ? 4

Thanks a lot~ @MichalBusta

MichalBusta commented 5 years ago

No easy answer sorry :)

coorful commented 5 years ago

ok,i will try.thanks so much for your reply~

alwc commented 5 years ago

@MichalBusta I also have some questions regarding your text recognition model.

1/ For latin languages, did you train your text recognition models with single word images only (i.e. no text lines)?

2/ How many images did you train your text recognition model with?

3/ Your text recognition model seems to use a ResNet like structure. Since this project focus on real-time, have you tried to train your text recognition model with a MobileNet backbone?

MichalBusta commented 5 years ago

pondělí 8. července 2019 Alex Lee notifications@github.com napsal(a):

@MichalBusta https://github.com/MichalBusta I also have some questions regarding your text recognition model.

1/ For latin languages, did you train your text recognition models with single word images only (i.e. no text lines)?

most of the images in used datasets are word level, some of the generated images are line-level

2/ How many images did you train your text recognition model with?

sorry I'm travelling so I can not give you exact number but it will be about 500k

3/ Your text recognition model seems to use a ResNet like structure.

no - text detection is resnet-like. recognition is plain vgg style.

Since this project focus on real-time, have you tried to train your text recognition model with a MobileNet backbone?

no - it is about the hardware - we were targeting low-end GPUs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/E2E-MLT/issues/30?email_source=notifications&email_token=AA7KHMBNKZJHN5VBB7HNKVLP6MGUVA5CNFSM4HI6T3FKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZMTZBY#issuecomment-509164679, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7KHMBKI7Q4ILDNOGM7G7LP6MGUVANCNFSM4HI6T3FA .

alwc commented 5 years ago

Thanks @MichalBusta ! One more thing, for the 500k images, it consists of an equal share between Arabic, Bangla, Chinese, Japanese, Korean and Latin (i.e. ~80k images for each script)?

MichalBusta commented 5 years ago

+/- : we have used more real latin and chinese data since there are a datasets. For synth equal split.

středa 10. července 2019 Alex Lee notifications@github.com napsal(a):

Thanks @MichalBusta https://github.com/MichalBusta ! One more thing, for the 500k images, it consists of an equal share between Arabic, Bangla, Chinese, Japanese, Korean and Latin (i.e. ~80k images for each script)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/E2E-MLT/issues/30?email_source=notifications&email_token=AA7KHMB3S4M2WPH4W2NTMXTP6VVLNA5CNFSM4HI6T3FKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZSJ2SA#issuecomment-509910344, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7KHMDUJNUH37FR3L3JAB3P6VVLNANCNFSM4HI6T3FA .