clovaai / CRAFT-pytorch

Official implementation of Character Region Awareness for Text Detection (CRAFT)
MIT License
3.1k stars 879 forks source link

Clarifications on Refine Net and it's short comings? #103

Open sivaranjith opened 4 years ago

sivaranjith commented 4 years ago

Hi @YoungminBaek

Thank you for publishing your work. I was trying to replicate your work with some modifications. First I used ResNet18 instead of VGG16 as the backbone. I was successfully able to reproduce the CRAFT architecture and Refine Net also.

One key observation I made is that I was able to reproduce Refine Net's output without the additional layers that you have used in the original paper. So in my version,

Refine Net output is obtained along with CRAFT outputs as extra feature map along with character and word maps.

Clarification 1: I was wondering if you have any insights from your experiments about the significance of having the extra conv layers for Refine Net.

The shortcoming of Refine Net is that the network doesn't seem to consider words that are seprated by large number of spaces as one single line. To fix this, I went one step further and tried to create a new(4th) output feature map.

So in the new variant, the final convolution layer, has the following 4 different outputs:

1. character map
2. word map
3. Refine Net like line map(I call this sub-line map)
4. line map, which works when there are large space between two words in same line(I call this full line map)

4th one is proving to be quite difficult to achieve and the model doesn't seem to be fitting well to the training data.

Clarification 2: Have you come across such a scenario in any of your experiments? If so it would be great if you can provide insights to this particular issue.

ahfyzzy commented 3 years ago

I met some problems when training refiner net. How you trained LinkRefiner network? How you generate the GTs of link score, what's the ratio of the width of refine link and the distance of paired points and what's the loss function when training? Also, would you mind open-sourcing your training codes? I would appreciate it to have your reply.