Closed ziudeso closed 5 years ago
One possible solution is to follow textsnake to find ending points. Then you can warp it into rectangle. However the size of rectangle is not mentioned in the paper.
Hi Godrickly! Thanks for the hint! Any ideas on how to perform such rectification from a polygon? Like which opencv functions would you use? Thanks a ton!
@ziudeso A reference implementation FYI, which I tried before. TextSnake
@Godricly Thanks a lot, you are a lifesaver =) Did you also find a way to get the polygonal output rather than the rectangular one? That'd be awesome!
nope. I was still trying to implement weakly learning part.
Sorry for the late reply.
We perform an affine transform on the upper and lower triangles at consecutive control points. At this time, height is fixed and width can be selected in several ways, chosen as the average of the two control points. Finally, since discontinuity can occur in the boundary, we removed the discontinuity through simple masking.
Following is the warping function that we are actually using.
def warpPerspectivePoly(img, poly):
# Use Affine transform
n = int(len(poly) / 2) - 1
width = 0
height = 0
for k in range(n):
box = np.float32([poly[k], poly[k+1], poly[-k-2], poly[-k-1]])
width += int((np.linalg.norm(box[0] - box[1]) + np.linalg.norm(box[2] - box[3])/2))
height += np.linalg.norm(box[1] - box[2])
width = int(width)
height = int(height / n)
output_img = np.zeros((height, width, 3), dtype=np.uint8)
width_step = 0
for k in range(n):
box = np.float32([poly[k], poly[k+1], poly[-k-2], poly[-k-1]])
w = int((np.linalg.norm(box[0] - box[1]) + np.linalg.norm(box[2] - box[3])/2))
# Top triangle
pts1 = box[:3]
pts2 = np.float32([[width_step,0],[width_step + w - 1,0],[width_step + w - 1,height-1]])
M = cv2.getAffineTransform(pts1, pts2)
warped_img = cv2.warpAffine(img, M, (width, height), borderMode=cv2.BORDER_REPLICATE)
warped_mask = np.zeros((height, width, 3), dtype=np.uint8)
warped_mask = cv2.fillConvexPoly(warped_mask, np.int32(pts2), (1, 1, 1))
output_img[warped_mask==1] = warped_img[warped_mask==1]
# Bottom triangle
pts1 = np.vstack((box[0], box[2:]))
pts2 = np.float32([[width_step,0],[width_step + w - 1,height-1],[width_step,height-1]])
M = cv2.getAffineTransform(pts1, pts2)
warped_img = cv2.warpAffine(img, M, (width, height), borderMode=cv2.BORDER_REPLICATE)
warped_mask = np.zeros((height, width, 3), dtype=np.uint8)
warped_mask = cv2.fillConvexPoly(warped_mask, np.int32(pts2), (1, 1, 1))
cv2.line(warped_mask, (width_step,0), (width_step + w - 1,height-1), (0, 0, 0), 1)
output_img[warped_mask==1] = warped_img[warped_mask==1]
width_step += w
return output_img
In the paper you state "Moreover, with our polygon representation, the curved images can be rectified into straight text images, which are also shown in Fig. 11. We believe this ability for rectification can further be of use for recognition tasks." My question is from a set of polygon points, how can I reconstruct the rectified image? Can you kindly point me towards the correct direction? many thanks in advance