autonise / CRAFT-Remade

Implementation of CRAFT Text Detection
MIT License
191 stars 47 forks source link

Getting wrong target_affinity when training SynthText style dataset #25

Closed lamhoangtung closed 5 years ago

lamhoangtung commented 5 years ago

Hi. Thanks for the great work. I'm trying to train this code on my own generated dataset, which is a SynthText style dataset for Japanese.

The code seem running but I've been noticed in the debug folder that the target_affinity.png file is just a black while target_characters.png seem correct.

Image: image

Target affinity: target_affinity

Target character heat map: target_characters

Do you have any suggestion what might be wrong here ? You can see my code in my fork, it's pretty much identical to yours (I only created a new data loader :P)

mayank-git-hub commented 5 years ago

I can't say surely without running the code on your dataset, but there should not be any reason for this happening if the character heat map is being generated correctly.

Does this happen just on this particular image or all images?

I am not able to reproduce it using the SynthText dataset I have available.

lamhoangtung commented 5 years ago

Hmm, this is happening to all images.

Btw, what is the type of self.txt in class DataLoaderSYNTH that your generate_affinity function expect? ;_;

lamhoangtung commented 5 years ago

Some interesting update on this bug :P, the target affinity appear to be wrong on other images (not empty) while the target characters remain correct. For example :3

Image: image

Target affinity: target_affinity

Target character heat map: target_characters

mayank-git-hub commented 5 years ago

self.txt is a list like - [ ['words', 'in', 'sample', 'one'], ['words', 'in', 'sample', two']]

Hmm, Can you confirm if the affinity is wrong, or whether only some of them are drawn, while others are being discarded by the .is_valid check in src/utils/data_manipulation.py

mayank-git-hub commented 5 years ago

I am finding a lot of affinity samples not passing the valid polygon test. Looking into why that is happening.

lamhoangtung commented 5 years ago

self.txt is a list like - [ ['words', 'in', 'sample', 'one'], ['words', 'in', 'sample', two']]

Hmm, Can you confirm if the affinity is wrong, or whether only some of them are drawn, while others are being discarded by the .is_valid check in src/utils/data_manipulation.py

Damn, my first mistake is keep using += when pre-process self.txt, I changed to append and now it working as expected, thank you @mayank-git-hub

But now I found some other interesting stuff happen :3.

For example:

Image: image

Target Affinity: target_affinity

Target Characters: target_characters

Is this Target Affinity consider correct ? target_affinity 2

Since I saw a link from the first line to the second line ?

mayank-git-hub commented 5 years ago

umm, looks like these two words have been taken to be one word in the self.txt.

Could you confirm if that is the case?

Also there is a bug in order_points because of which some affinity bbox are being lost. The order_points would work for rectangles, but not for quadilaterals. I am fixing the issue.

mayank-git-hub commented 5 years ago

You should remove all the uses of order_points in your fork. It is unnecessary(I had used this code from another source where the order matters, but in our case due to the gaussian heatmap being isotropic, it won't) and does not actually order the points.

lamhoangtung commented 5 years ago

umm, looks like these two words have been taken to be one word in the self.txt.

Could you confirm if that is the case?

Also there is a bug in order_points because of which some affinity bbox are being lost. The order_points would work for rectangles, but not for quadilaterals. I am fixing the issue.

That's true. Seem like this is a bugs in my generated data ;_;

mayank-git-hub commented 5 years ago

Cool, Thanx for bringing this bug to attention!