clovaai / CRAFT-pytorch

Official implementation of Character Region Awareness for Text Detection (CRAFT)
MIT License
3.06k stars 867 forks source link

about niter variable #63

Open ntdat017 opened 4 years ago

ntdat017 commented 4 years ago

I look thought source code and getDetBoxes_core() function but I can't understand what is institute of niter variable. I know niter like padding pixel but why it's has that formal's mathematics.

        segmap = np.zeros(textmap.shape, dtype=np.uint8)
        segmap[labels==k] = 255
        segmap[np.logical_and(link_score==1, text_score==0)] = 0   # remove link area
        x, y = stats[k, cv2.CC_STAT_LEFT], stats[k, cv2.CC_STAT_TOP]
        w, h = stats[k, cv2.CC_STAT_WIDTH], stats[k, cv2.CC_STAT_HEIGHT]
        niter = int(math.sqrt(size * min(w, h) / (w * h)) * 2)
        sx, ex, sy, ey = x - niter, x + w + niter + 1, y - niter, y + h + niter + 1
        # boundary check
        if sx < 0 : sx = 0
        if sy < 0 : sy = 0
        if ex >= img_w: ex = img_w
        if ey >= img_h: ey = img_h
        kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(1 + niter, 1 + niter))
        segmap[sy:ey, sx:ex] = cv2.dilate(segmap[sy:ey, sx:ex], kernel)
YoungminBaek commented 4 years ago

@ntdat017 A silly answer to a wise question. The formula is based on empirical observation. :)

        niter = int(math.sqrt(size * min(w, h) / (w * h)) * 2)

Actually, there are two parts.

(1) size/(w*h) is the occupancy ratio of the text region over the rectangle text box. We found the fact that a single character region needs to be dilated more, and this makes it possible.

(2) min(w,h) is to make the dilation ratio proportional to the box height. I think this is more intuitive than the above formula.

This part is an open question, and any suggestions regarding post-processing will be helpful. :)

Boatsure commented 3 years ago

@YoungminBaek A lot thanks for your answer and I get the idea behind this post-processing! But silly me, I still cann't understand what are the sx, sy, ex or ey and why they are necessary. At kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(1 + niter, 1 + niter)), niter decides the size of kernel, why is that? Thank U!