aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.36k stars 643 forks source link

ABCNet V1&V2 : bad performance on dense & long text detection #443

Open haoran1062 opened 2 years ago

haoran1062 commented 2 years ago

after trained abcnet v1 and v2, I found that ABCNet good at short text detection, but have bad performance on dense and long text, high miss rate and can't regression good at long text polygon. Even if I modified max detect instance to 200, PRE/POST_NMS_TOPK_TRAIN/TEST x2, but doesn't work much batter. Is that case of based the anchor-free model or something else? please help. PS: in my case, I have 100+ instance on each images and 30% of them is long text & dense area.

Yuliang-Liu commented 2 years ago

@haoran1062 Can you kindly show one or two examples of what you are testing on? Have you tested the demo on CTW1500 which is also based on long text line.

haoran1062 commented 2 years ago

@haoran1062 Can you kindly show one or two examples of what you are testing on? Have you tested the demo on CTW1500 which is also based on long text line. I have tested on CTW1500, some of case have pretty good performance, and a little case have bad performance, there's some my case: Screenshot from 2021-09-24 12-36-52 Screenshot from 2021-09-24 12-34-47 Screenshot from 2021-09-24 12-34-34 Screenshot from 2021-09-24 12-33-16

Screenshot from 2021-09-24 12-33-48

Yuliang-Liu commented 2 years ago

@haoran1062 Thanks for sharing the results! Seems you are tring to train and test on a new dataset (document-like), and for such case, you need to train your own model using these data. In my opinion, the results you provided suggest that your model did not converge well. Below are some results for Chinese text (the ReCTS model we provided): 2

1

and for ancient words (“彝文”): image

haoran1062 commented 2 years ago

@haoran1062 Thanks for sharing the results! Seems you are tring to train and test on a new dataset (document-like), and for such case, you need to train your own model using these data. In my opinion, the results you provided suggest that your model did not converge well. Below are some results for Chinese text (the ReCTS model we provided): 2

1

and for ancient words (“彝文”): image

thanks you bro, I have tried a lot to train the model on my datasets, I tried to train 5 days on 8 x rtx3090 but still not work good, I guess the document-like data have a much greater density than sense text data, and anchor-free based model can not handle that well. What do you think about that?

Yuliang-Liu commented 2 years ago

@haoran1062 What you mentioned could be the problem. Since I haven't tried on much greater density data, I am not sure if it is the reason. I noticed that the results you given did not provide any recognition content, and all the bounding box are all axis-aligned. Can you show some examples of the ground truths bounding boxes as well as the recognition ground truths of your training data? And, how many data were used for training 5 days using 8 gpus?

haoran1062 commented 2 years ago

@haoran1062 What you mentioned could be the problem. Since I haven't tried on much greater density data, I am not sure if it is the reason. I noticed that the results you given did not provide any recognition content, and all the bounding box are all axis-aligned. Can you show some examples of the ground truths bounding boxes as well as the recognition ground truths of your training data? And, how many data were used for training 5 days using 8 gpus?

not all the bbox are axis-aligned, my datasets have about 12000 images, and I modified some codes to show polygon without the recognition results, so it's not recognition's error, then I show you some data samples with gt: 17 25 346

haoran1062 commented 2 years ago

@haoran1062 What you mentioned could be the problem. Since I haven't tried on much greater density data, I am not sure if it is the reason. I noticed that the results you given did not provide any recognition content, and all the bounding box are all axis-aligned. Can you show some examples of the ground truths bounding boxes as well as the recognition ground truths of your training data? And, how many data were used for training 5 days using 8 gpus?

the gt label have 14 points in clockwised and string label, 7 points of top and other 7 points below

Yuliang-Liu commented 2 years ago

@haoran1062 Thanks for providing the GT. It should be correct. I got what you mean now, and I guess it is the limitation of current method. Seems like detection for dense & short text is okay; but for dense & long text with extreme aspect ratio of width and height, the results are mostly unsatisfactory (imprecise).

Did you try test the model on the training data?

Again, thank you so much for pointing out the limitation of our method! I will keep this issue opening until there is any improvement for this.

Yuliang-Liu commented 2 years ago

@tianzhi0549 @stanstarks Do you have any suggestion for this issue? It seems like the problem is mainly related to our detector.

haoran1062 commented 2 years ago

@haoran1062 Thanks for providing the GT. It should be correct. I got what you mean now, and I guess it is the limitation of current method. Seems like detection for dense & short text is okay; but for dense & long text with extreme aspect ratio of width and height, the results are mostly unsatisfactory (imprecise).

Did you try test the model on the training data?

  • If similar issue occurs, I guess the dense & long text instances might somehow disturb the training of each other. For such case, it might take a while to research how to solve the problem.
  • If the model performs well on the training data, please let me know.

Again, thank you so much for pointing out the limitation of our method! I will keep this issue opening until there is any improvement for this.

Yes, it was tested on my train data, most instances could be detected well, but detector has bad performance on dense & long text polygons. Do you need some sample training data(with label file) from my side so that you can have a closer look at it? No problem. Thank you for following up on this issue. I'm glad that I have the chance to contribute.

Yuliang-Liu commented 2 years ago

@haoran1062 The problem might be caused by the label assignment during the BezierAlign stage.

You can try modifying the assignment strategies. For example:

in adet/modeling/poolers.py, add

def _bezier_long_size(beziers):
    beziers = beziers.tensor
    # compute the longest side of instances
    p1 = beziers[:, :2]
    p2 = beziers[:, 2:4]
    p3 = beziers[:, 4:6]
    p4 = beziers[:, 6:8]
    p5 = beziers[:, 8:10]
    p6 = beziers[:, 10:12]
    p7 = beziers[:, 12:14]
    p8 = beziers[:, 14:]
    max_size_up = ((p1 - p2) ** 2).sum(dim=1).sqrt() + ((p2 - p3) ** 2).sum(dim=1).sqrt() + ((p3 - p4) ** 2).sum(dim=1).sqrt()
    max_size_bottom = ((p5 - p6) ** 2).sum(dim=1).sqrt() + ((p6 - p7) ** 2).sum(dim=1).sqrt() + ((p7 - p8) ** 2).sum(dim=1).sqrt()
    max_size = torch.max(max_size_up, max_size_bottom)
    return max_size

below the function of _bezier_height. And, changing the _bezier_height to _bezier_long_size in the function assign_boxes_to_levels_bezier at line 79.

Since I have no so many devices and such kind of data, I may not be able to test if this works. It would be much appreciated if you can validate whether it can ease the problem.

haoran1062 commented 2 years ago

@haoran1062 The problem might be caused by the label assignment during the BezierAlign stage.

You can try modifying the assignment strategies. For example:

in adet/modeling/poolers.py, add

def _bezier_long_size(beziers):
    beziers = beziers.tensor
    # compute the longest side of instances
    p1 = beziers[:, :2]
    p2 = beziers[:, 2:4]
    p3 = beziers[:, 4:6]
    p4 = beziers[:, 6:8]
    p5 = beziers[:, 8:10]
    p6 = beziers[:, 10:12]
    p7 = beziers[:, 12:14]
    p8 = beziers[:, 14:]
    max_size_up = ((p1 - p2) ** 2).sum(dim=1).sqrt() + ((p2 - p3) ** 2).sum(dim=1).sqrt() + ((p3 - p4) ** 2).sum(dim=1).sqrt()
    max_size_bottom = ((p5 - p6) ** 2).sum(dim=1).sqrt() + ((p6 - p7) ** 2).sum(dim=1).sqrt() + ((p7 - p8) ** 2).sum(dim=1).sqrt()
    max_size = torch.max(max_size_up, max_size_bottom)
    return max_size

below the function of _bezier_height. And, changing the _bezier_height to _bezier_long_size in the function assign_boxes_to_levels_bezier at line 79.

Since I have no so many devices and such kind of data, I may not be able to test if this works. It would be much appreciated if you can validate whether it can ease the problem.

Thank you! I'll try what you say and let you know.

Yuliang-Liu commented 2 years ago

@haoran1062 THX! I have modified the _bezier_long_size function to make it more reasonable. Please try the following:

def _bezier_long_size(beziers):
    beziers = beziers.tensor
    def bezier_to_polygon(bezier):
        u = torch.linspace(0, 1, 20).cuda(bezier.device)
        bezier = bezier.reshape((2, 4, 2)).reshape(4, 4)
        points = torch.outer((1 - u) ** 3, bezier[:, 0]) \
            + torch.outer(3 * u * ((1 - u) ** 2), bezier[:, 1]) \
            + torch.outer(3 * (u ** 2) * (1 - u), bezier[:, 2]) \
            + torch.outer(u ** 3, bezier[:, 3])
        # convert points to polygon
        points = torch.cat((points[:, :2], points[:, 2:]), 0)
        return points
    max_size = []
    for i in range(beziers.shape[0]):
        pts = bezier_to_polygon(beziers[i])
        up_arc = ((pts[1:20, :] - pts[:19, :]) ** 2).sum(dim=1).sqrt().sum(dim=0)
        bot_arc = ((pts[21:, :] - pts[20:-1, :]) ** 2).sum(dim=1).sqrt().sum(dim=0)
        max_size.append(torch.max(up_arc, bot_arc))
    max_size = torch.tensor(max_size)
    return max_size
Single430 commented 2 years ago

M

haoran1062 commented 2 years ago

@haoran1062 THX! I have modified the _bezier_long_size function to make it more reasonable. Please try the following:

def _bezier_long_size(beziers):
    beziers = beziers.tensor
    def bezier_to_polygon(bezier):
        u = torch.linspace(0, 1, 20).cuda(bezier.device)
        bezier = bezier.reshape((2, 4, 2)).reshape(4, 4)
        points = torch.outer((1 - u) ** 3, bezier[:, 0]) \
            + torch.outer(3 * u * ((1 - u) ** 2), bezier[:, 1]) \
            + torch.outer(3 * (u ** 2) * (1 - u), bezier[:, 2]) \
            + torch.outer(u ** 3, bezier[:, 3])
        # convert points to polygon
        points = torch.cat((points[:, :2], points[:, 2:]), 0)
        return points
    max_size = []
    for i in range(beziers.shape[0]):
        pts = bezier_to_polygon(beziers[i])
        up_arc = ((pts[1:20, :] - pts[:19, :]) ** 2).sum(dim=1).sqrt().sum(dim=0)
        bot_arc = ((pts[21:, :] - pts[20:-1, :]) ** 2).sum(dim=1).sqrt().sum(dim=0)
        max_size.append(torch.max(up_arc, bot_arc))
    max_size = torch.tensor(max_size)
    return max_size

Sorry for the late reply. The performance has improved but it's still not ideal. Here are some badcases: Screenshot from 2021-10-25 16-38-22 Screenshot from 2021-10-25 16-38-48 Screenshot from 2021-10-25 16-39-06

TyrionChou commented 1 year ago

have you solve this problem?