Open HXACA opened 5 years ago
Q: Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step?
No, we don't apply any tricks in the procedure of crop. But you may need to pay attention to some details of cropping images and generating pyramid labels.
The steps of cropping images and generating pyramid labels are as follows:
Considering the training speed, we keep the mask in the form of points list, not a H*W image, until the sample are forwarded to the mask branch.
crop the origin text mask
cropped_text_mask = crop_region ∩ origin_text_mask
= Polygon[cropped_points_num, {x,y}]
note: the cropped_points_num
may varies from 3
to 8
.
get the bounding box by wrapping the cropped mask with a new bounding box, rather than cropping the origin bounding box. As illustrated in the above image, the cropped origin bounding box
may greater than the correct cropped bounding box
.
generate the pyramid label for the corresponding predicted bounding box. In our setting, the generation step of pyramid label has been deferred to the stage of calculating the mask loss.
predicted_bounding_box = (left, top, bottom, right)
mask_label = cropped_text_mask ∩ predicted_bounding_box
= Tensor[Channel=1, H=28, W=28] # pyramid label or binary label
note: though the points_num
of the cropped_text_mask
varies from 3
to 8
, the pyramid label can still handle this variance.
@JingChaoLiu Thanks for your response
Hi, actually my questions refer to pyramid label generation, not the cropping, but I'll use this issue quotes :)
- Considering the training speed, we keep the mask in the form of points list, not a H*W image, until the sample are forwarded to the mask branch.
You mean you keep them in form of vertices, not interior points, right? So in terms of maskrcnn_benchmark, they are PolygonInstances?
- generate the pyramid label for the corresponding predicted bounding box. In our setting, the generation step of pyramid label has been deferred to the stage of calculating the mask loss.
So they're calculated on 28x28 grid? Something like:
for p in grid_28x28: for v in vertices: [alpha, beta] = A^-1*b; if alpha>=0 and beta>=0: score(p) = max(1-(alpha+beta),0)
You mean you keep them in form of vertices, not interior points, right? So in terms of maskrcnn_benchmark, they are PolygonInstances?
Yes
So they're calculated on 28x28 grid? Something like: ...
Denote the ground-truth mask point list as P=Tensor[points_num, {x,y}]
and the predicted bounding box as pred_box = {pred_top, pred_bottom, pred_left, pred_right}
. Furthermore, define pred_h = pred_bottom - pred_top
and pred_w = pred_right - pred_left
. We have tried two schemas:
generate a mask_label within {pred_top, pred_bottom, pred_left, pred_right}
based on P, then resize this mask_label from the scale of [pred_h, pred_w]
to the scale of [28, 28]
map pred_box from {pred_top, pred_bottom, pred_left, pred_right}
to {0, 28, 0, 28}
and perform the same map for the points list P, i.e. resized_P = (P-(pred_left, pred_top)) * (28/pred_h, 28/pred_w)
, finally generate a mask_label within {0, 28, 0, 28}
based on resized_P
The schema you mentioned may be schema 2. In our experiments, schema 2 is lower than schema 1 by 0.3% F-measure. But schema 2 is very efficient both for memory and for calculation. The training time of schema 2 is two-third of schema 1.
Thank you @JingChaoLiu for your valuable analysis. It seems like current maskrcnn-benchmark approach is closer to 2., because there're basically three steps:
I don't get it why they're not using roialign here for efficiency
By the way is matrix inversion really necessarily for calculating target? I mean this pyramid function seems like very "regular" and I'm suprised there's no "analytic" formula If not, maybe for efficiency of training some other form like "stepwise" pyramid would be better? Actually I guess polygon approach is kind of more refined idea from EAST where gt "mass" was uniformely concentrated in the center
Regards,
Could you share the code of generating Pyramid label?
Here is a simplified version. Adjust these code as you need. @donglin8506
import cv2
import numpy as np
def generate_pyramid_label(H, W, corner_points):
"""
:param int H: image_H
:param int W: image_W
:param np.ndarray corner_points: dtype=np.float32, shape=[point_num, {x,y}] 3 <= point_num <= 8
:return: np.ndarray ans: dtype=np.float32, shape=[H, W]
generate a pyramid label from corner_points
within the bounding box {box_top=0, box_bottom=H, box_left=0, box_right=W}
"""
point_num = len(corner_points)
center = corner_points.mean(axis=0)
vectors = corner_points - center
matrices = np.empty((point_num, 2, 2), dtype=np.float32)
for i in range(point_num):
m = vectors[[i, (i + 1) % point_num]].T
matrices[i] = np.linalg.pinv(m)
points = np.empty((H, W, 2), dtype=np.float32) # H, W, {x, y}
points[:, :, 0] = np.arange(W)
points[:, :, 1] = np.arange(H)[..., None]
points -= center
ans: np.ndarray = np.matmul(matrices[:, None, None, ...], points[..., None])
ans = ans.squeeze()
ans = (ans >= 0).all(axis=-1) * ans.sum(axis=-1)
ans = np.max(ans, axis=0)
ans = np.maximum(1 - ans, 0)
return ans
def main():
H, W = 150, 224
corner_points = np.array([
187, 0,
224, 80,
30, 150,
0, 65
], dtype=np.float32).reshape(-1, 2)
ans = generate_pyramid_label(H, W, corner_points)
cv2.imshow('image', ans)
cv2.waitKey(0)
if __name__ == '__main__':
main()
@JingChaoLiu Thank you very much, this will give a lot of help, you're welcome! Best regards!
@JingChaoLiu Thank you for your great work, but I have a question about generating pyramid labels. I generate pyramid mask in your way, but it has also a few white dots, as shown in the figure. Does it affect model training? Ask for your help, thanks.
@insightcs It's OK. This won't hurt the model training. The phenomenon is caused by the numerical instability of matrix inversion of matrices[i] = np.linalg.pinv(m)
@insightcs hi, if I want to use this soft mask label, need I add this code to the project? I can't find about soft mask label in the project.
Due to the gt area is not pure text,I get many wrong regions when I try to randomly crop on the resized image.Is there some tricks in this step?