clovaai / CRAFT-pytorch

Official implementation of Character Region Awareness for Text Detection (CRAFT)
MIT License
3.01k stars 858 forks source link

How to get rectified polygons from polygon points? #8

Closed ziudeso closed 5 years ago

ziudeso commented 5 years ago

In the paper you state "Moreover, with our polygon representation, the curved images can be rectified into straight text images, which are also shown in Fig. 11. We believe this ability for rectification can further be of use for recognition tasks." My question is from a set of polygon points, how can I reconstruct the rectified image? Can you kindly point me towards the correct direction? many thanks in advance

Godricly commented 5 years ago

One possible solution is to follow textsnake to find ending points. Then you can warp it into rectangle. However the size of rectangle is not mentioned in the paper.

ziudeso commented 5 years ago

Hi Godrickly! Thanks for the hint! Any ideas on how to perform such rectification from a polygon? Like which opencv functions would you use? Thanks a ton!

Godricly commented 5 years ago

@ziudeso A reference implementation FYI, which I tried before. TextSnake

ziudeso commented 5 years ago

@Godricly Thanks a lot, you are a lifesaver =) Did you also find a way to get the polygonal output rather than the rectangular one? That'd be awesome!

Godricly commented 5 years ago

nope. I was still trying to implement weakly learning part.

YoungminBaek commented 5 years ago

Sorry for the late reply.

We perform an affine transform on the upper and lower triangles at consecutive control points. At this time, height is fixed and width can be selected in several ways, chosen as the average of the two control points. Finally, since discontinuity can occur in the boundary, we removed the discontinuity through simple masking.

Following is the warping function that we are actually using.

def warpPerspectivePoly(img, poly):
    # Use Affine transform
    n = int(len(poly) / 2) - 1
    width = 0
    height = 0
    for k in range(n):
        box = np.float32([poly[k], poly[k+1], poly[-k-2], poly[-k-1]])
        width += int((np.linalg.norm(box[0] - box[1]) + np.linalg.norm(box[2] - box[3])/2))
        height += np.linalg.norm(box[1] - box[2])
    width = int(width)
    height = int(height / n)

    output_img = np.zeros((height, width, 3), dtype=np.uint8)
    width_step = 0
    for k in range(n):
        box = np.float32([poly[k], poly[k+1], poly[-k-2], poly[-k-1]])
        w = int((np.linalg.norm(box[0] - box[1]) + np.linalg.norm(box[2] - box[3])/2))

        # Top triangle
        pts1 = box[:3]
        pts2 = np.float32([[width_step,0],[width_step + w - 1,0],[width_step + w - 1,height-1]])
        M = cv2.getAffineTransform(pts1, pts2)
        warped_img = cv2.warpAffine(img, M, (width, height), borderMode=cv2.BORDER_REPLICATE)
        warped_mask = np.zeros((height, width, 3), dtype=np.uint8)
        warped_mask = cv2.fillConvexPoly(warped_mask, np.int32(pts2), (1, 1, 1))
        output_img[warped_mask==1] = warped_img[warped_mask==1]

        # Bottom triangle
        pts1 = np.vstack((box[0], box[2:]))
        pts2 = np.float32([[width_step,0],[width_step + w - 1,height-1],[width_step,height-1]])
        M = cv2.getAffineTransform(pts1, pts2)
        warped_img = cv2.warpAffine(img, M, (width, height), borderMode=cv2.BORDER_REPLICATE)
        warped_mask = np.zeros((height, width, 3), dtype=np.uint8)
        warped_mask = cv2.fillConvexPoly(warped_mask, np.int32(pts2), (1, 1, 1))
        cv2.line(warped_mask, (width_step,0), (width_step + w - 1,height-1), (0, 0, 0), 1)
        output_img[warped_mask==1] = warped_img[warped_mask==1]

        width_step += w
    return output_img