Closed kei6 closed 5 years ago
úterý 9. července 2019 Kei_Cin notifications@github.com napsal(a):
Dear @MichalBusta https://github.com/MichalBusta , According to your paper, The prediction output of Text Localization part consists of seven channels: per-feature text/no-text confidence score, coordinate of the bounding box and angle parameter.
Then I checked your code here: https://github.com/MichalBusta/E2E-MLT/blob/28583581fb17b6e8 3bc8dc8c84b6bc7fb4957341/models.py#L428-L430
and didn't find any value for that confidence score.
Can you help to explain it?
Thanks a lot,
sorry - bad naming, see - segm_pred
Kei,
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/E2E-MLT/issues/44?email_source=notifications&email_token=AA7KHMAUEKMSDBWIVIATLUDP6RHZJA5CNFSM4H7DIFHKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6BMZQQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7KHMEWWJSMAGV4U3DWEO3P6RHZJANCNFSM4H7DIFHA .
Thanks for quick reply! It's not too bad naming, when to multiply segm_pred with 255 and show it as an image. It definitely plays the function of a segmentation map. It shows me how different result if I reduce the input image's resolution.
The problem comes into me is: model get lower accuracy when reducing input resolutions. To increase model accuracy, I need to feed the input with very high resolution (ex. 3200x1800), that leads to increase time-consuming (about 20s/images - CPU)
I want to solve this problem by changing this below scale_factor into 1: https://github.com/MichalBusta/E2E-MLT/blob/28583581fb17b6e83bc8dc8c84b6bc7fb4957341/data_gen.py#L413 I also need to customize model and do_nms function to adaptive with this scale_factor.
May I have your idea about this? Do you have any idea to improve it? I have a very slow GPU for training, to see a result costs me about 3 weeks.
Here is some segm_pred*255 examples with different input resolutions: Input is (2800x1900), ouput is (700x475)- Good result with 16s to run
Input is (800x600), output is (200x150)- Bad result with 3s to run
Sorry for this long explanation, thanks in advance! Kei
On Wed, 10 Jul 2019 04:15 Kei_Cin, notifications@github.com wrote:
Thanks for quick reply! It's not too bad naming, when to multiply segm_pred with 255 and show it as an image. It definitely plays the function of a segmentation map. It shows me how different result if I reduce the input image's resolution.
The problem comes into me is: model get lower accuracy when reducing input resolutions. To increase model accuracy, I need to feed the input with very high resolution (ex. 3200x1800), that leads to increase time-consuming (about 20s/images - CPU)
I want to solve this problem by changing this below scale_factor into 1:
https://github.com/MichalBusta/E2E-MLT/blob/28583581fb17b6e83bc8dc8c84b6bc7fb4957341/data_gen.py#L413 I also need to customize model and do_nms function to adaptive with this scale_factor.
May I have your idea about this? Do you have any idea to improve it? I have a very slow GPU for training, to see a result costs me about 3 weeks.
Just few comments:
- the most expensive computation is given by resolution - so it will not help you much if you go to resolution 1 - you can do calculation on paper
simple way is try learn more on small images since we are all fitting ICDAR datasets where min size of text to get the cookie is 12px
buy better GPU / use some of the free Web resources as Google colab?
Here is some segm_pred*255 examples with different input resolutions:
Input is (2800x1900), ouput is (700x475)- Good result with 16s to run [image: 0 jpg] https://user-images.githubusercontent.com/45931733/60951774-0fc17e80-a324-11e9-845f-fd13876d44a8.jpg
Input is (800x600), output is (200x150)- Bad result with 3s to run [image: 0 jpg] https://user-images.githubusercontent.com/45931733/60951894-5616dd80-a324-11e9-8617-2e71c1e39425.jpg
Sorry for this long explanation, thanks in advance! Kei
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/E2E-MLT/issues/44?email_source=notifications&email_token=AA7KHMCQCUKKQY5DFJCVAMTP6WLAZA5CNFSM4H7DIFHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZSV4VQ#issuecomment-509959766, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7KHMB2EH3CLOEI7MWOTUTP6WLAZANCNFSM4H7DIFHA .
Dear @MichalBusta , According to your paper, The prediction output of Text Localization part consists of seven channels: per-feature text/no-text confidence score, coordinate of the bounding box and angle parameter.
Then I checked your code here: https://github.com/MichalBusta/E2E-MLT/blob/28583581fb17b6e83bc8dc8c84b6bc7fb4957341/models.py#L428-L430
and didn't find any value for that confidence score.
Can you help to explain it? Thanks a lot,
Kei,