how does bounding box regression work in EAST?

mailcorahul commented 5 years ago

I was going through EAST paper and I am having a doubt on how exactly bounding boxes are computed. Basically after passing the input image through some set of convolutional layers, a 1x1xD filter is applied on the final conv volume to get W x H x 4 output volume where 4 channels are the offsets to top, left, bottom, right boundaries. My doubt is since we are looking at only one cell in the final feature map, how is it possible for the network to find offsets with respect to all four boundary points.

To make it more clear, let's say 1x1 filter is looking at top left grid in the final feature map for a text box in the image.

How can it predict the offset values of bottom and right boundaries without actually looking at those regions in the feature map?
Also how are the final text box boundaries computed since every cell now has 4 offsets to it? Correct me if I am wrong.

Can anyone explain me how this works? @argman

yangyangyang127 commented 5 years ago

Hi, I have the same question. did you figure it out? 3Q

mailcorahul commented 5 years ago

@yangyangyang127 No I couldn't figure it out yet. Even tried emailing the authors of EAST paper, but didn't get any response from them as well. Do you have any intuitions as to how this might work?

YuxiangTang commented 5 years ago

Try to read the icdar.py, and you will find the answer.

argman / EAST

how does bounding box regression work in EAST? #252