Clarifications on Refine Net and it's short comings?

Hi @YoungminBaek

Thank you for publishing your work. I was trying to replicate your work with some modifications. First I used ResNet18 instead of VGG16 as the backbone. I was successfully able to reproduce the CRAFT architecture and Refine Net also.

One key observation I made is that I was able to reproduce Refine Net's output without the additional layers that you have used in the original paper. So in my version,

Refine Net output is obtained along with CRAFT outputs as extra feature map along with character and word maps.

Clarification 1: I was wondering if you have any insights from your experiments about the significance of having the extra conv layers for Refine Net.

The shortcoming of Refine Net is that the network doesn't seem to consider words that are seprated by large number of spaces as one single line. To fix this, I went one step further and tried to create a new(4th) output feature map.

So in the new variant, the final convolution layer, has the following 4 different outputs:

1. character map
2. word map
3. Refine Net like line map(I call this sub-line map)
4. line map, which works when there are large space between two words in same line(I call this full line map)

4th one is proving to be quite difficult to achieve and the model doesn't seem to be fitting well to the training data.

Clarification 2: Have you come across such a scenario in any of your experiments? If so it would be great if you can provide insights to this particular issue.

clovaai / CRAFT-pytorch

Clarifications on Refine Net and it's short comings? #103