MichalBusta / E2E-MLT

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
MIT License
291 stars 84 forks source link

per-feature text/no-text confidence score #44

Closed kei6 closed 5 years ago

kei6 commented 5 years ago

Dear @MichalBusta , According to your paper, The prediction output of Text Localization part consists of seven channels: per-feature text/no-text confidence score, coordinate of the bounding box and angle parameter.

Then I checked your code here: https://github.com/MichalBusta/E2E-MLT/blob/28583581fb17b6e83bc8dc8c84b6bc7fb4957341/models.py#L428-L430

and didn't find any value for that confidence score.

Can you help to explain it? Thanks a lot,

Kei,

MichalBusta commented 5 years ago

úterý 9. července 2019 Kei_Cin notifications@github.com napsal(a):

Dear @MichalBusta https://github.com/MichalBusta , According to your paper, The prediction output of Text Localization part consists of seven channels: per-feature text/no-text confidence score, coordinate of the bounding box and angle parameter.

Then I checked your code here: https://github.com/MichalBusta/E2E-MLT/blob/28583581fb17b6e8 3bc8dc8c84b6bc7fb4957341/models.py#L428-L430

and didn't find any value for that confidence score.

Can you help to explain it?

Thanks a lot,

sorry - bad naming, see - segm_pred

Kei,

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/E2E-MLT/issues/44?email_source=notifications&email_token=AA7KHMAUEKMSDBWIVIATLUDP6RHZJA5CNFSM4H7DIFHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6BMZQQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7KHMEWWJSMAGV4U3DWEO3P6RHZJANCNFSM4H7DIFHA .

kei6 commented 5 years ago

Thanks for quick reply! It's not too bad naming, when to multiply segm_pred with 255 and show it as an image. It definitely plays the function of a segmentation map. It shows me how different result if I reduce the input image's resolution.

The problem comes into me is: model get lower accuracy when reducing input resolutions. To increase model accuracy, I need to feed the input with very high resolution (ex. 3200x1800), that leads to increase time-consuming (about 20s/images - CPU)

I want to solve this problem by changing this below scale_factor into 1: https://github.com/MichalBusta/E2E-MLT/blob/28583581fb17b6e83bc8dc8c84b6bc7fb4957341/data_gen.py#L413 I also need to customize model and do_nms function to adaptive with this scale_factor.

May I have your idea about this? Do you have any idea to improve it? I have a very slow GPU for training, to see a result costs me about 3 weeks.

Here is some segm_pred*255 examples with different input resolutions: Input is (2800x1900), ouput is (700x475)- Good result with 16s to run 0 jpg

Input is (800x600), output is (200x150)- Bad result with 3s to run 0 jpg

Sorry for this long explanation, thanks in advance! Kei

MichalBusta commented 5 years ago

On Wed, 10 Jul 2019 04:15 Kei_Cin, notifications@github.com wrote:

Thanks for quick reply! It's not too bad naming, when to multiply segm_pred with 255 and show it as an image. It definitely plays the function of a segmentation map. It shows me how different result if I reduce the input image's resolution.

The problem comes into me is: model get lower accuracy when reducing input resolutions. To increase model accuracy, I need to feed the input with very high resolution (ex. 3200x1800), that leads to increase time-consuming (about 20s/images - CPU)

I want to solve this problem by changing this below scale_factor into 1:

https://github.com/MichalBusta/E2E-MLT/blob/28583581fb17b6e83bc8dc8c84b6bc7fb4957341/data_gen.py#L413 I also need to customize model and do_nms function to adaptive with this scale_factor.

May I have your idea about this? Do you have any idea to improve it? I have a very slow GPU for training, to see a result costs me about 3 weeks.

Just few comments:

  • the most expensive computation is given by resolution - so it will not help you much if you go to resolution 1 - you can do calculation on paper

Here is some segm_pred*255 examples with different input resolutions:

Input is (2800x1900), ouput is (700x475)- Good result with 16s to run [image: 0 jpg] https://user-images.githubusercontent.com/45931733/60951774-0fc17e80-a324-11e9-845f-fd13876d44a8.jpg

Input is (800x600), output is (200x150)- Bad result with 3s to run [image: 0 jpg] https://user-images.githubusercontent.com/45931733/60951894-5616dd80-a324-11e9-8617-2e71c1e39425.jpg

Sorry for this long explanation, thanks in advance! Kei

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/E2E-MLT/issues/44?email_source=notifications&email_token=AA7KHMCQCUKKQY5DFJCVAMTP6WLAZA5CNFSM4H7DIFHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZSV4VQ#issuecomment-509959766, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7KHMB2EH3CLOEI7MWOTUTP6WLAZANCNFSM4H7DIFHA .