Shank2358 / GGHL

This is the implementation of GGHL (A General Gaussian Heatmap Label Assignment for Arbitrary-Oriented Object Detection)
GNU General Public License v3.0
616 stars 111 forks source link

demo #1

Closed trungpham2606 closed 2 years ago

trungpham2606 commented 2 years ago

Thank you @Shank2358 for sharing a great work. Iam trying to visualize the detection with your work. But when I normally ran the test.py file, it popped up an error about e2cnn. So can you show me the reference of e2cnn. Thank in advance.

Shank2358 commented 2 years ago

Thank you @Shank2358 for sharing a great work. Iam trying to visualize the detection with your work. But when I normally ran the test.py file, it popped up an error about e2cnn. So can you show me the reference of e2cnn. Thank in advance.

Thanks. You only need to comment out all the parts of E2CNN. I updated GGHL.py, you can try it with the new version.

In addition, the E2CNN library (https://github.com/csuhan/e2cnn) is used by ReDet's backbone. Because I want to try other backbones for recently, I added it. If there are other problems, I am happy to answer them.

trungpham2606 commented 2 years ago

Thank you @Shank2358 for your quick response. According to your paper, you also tested the model on SKU dataset, the results were extremely great. Can you provide the pretrained weights for the SKU dataset as well ?

Shank2358 commented 2 years ago

for the SKU dataset

Of course. The weights for the SKU datase has already been available. You can download it from Baidu Disk(password: c3jv) or Google Drive.

The weights for the SSDD+ dataset will also be available soon.

Thank you.

trungpham2606 commented 2 years ago

@Shank2358 Thank to your weights, I was able to visualize the detection on an image from DOTA and SKU. image image

I really want to train my own dataset. The VOC format is not a problem but can you tell me the way you define the rotating angle ?

Shank2358 commented 2 years ago

@Shank2358 Thank to your weights, I was able to visualize the detection on an image from DOTA and SKU. image image

I really want to train my own dataset. The VOC format is not a problem but can you tell me the way you define the rotating angle ?

Congratulations!

For oriented bounding boxes, maybe you can use cv2.polylines(img, [points]) to draw them. The results in our paper are drawn like this. (I see that the result of your visualization is horizontal bounding boxes, maybe you can try this)

The format of the training dataset is like this: image_path xmin,ymin,xmax,ymax,class_id,x1,y1,x2,y2,x3,y3,x4,y4,area_ratio,angle[0,-90)

We use the opencv definition of the angle, that is, the range of the angle is [0,-90), and a more specific explanation is as follows image

You can use the cv2.minAreaRect(Points) function to calculate the angle. For more specific calculation methods and explanations, you can refer to the official documentation of opencv。 https://docs.opencv.org/3.4/de/d62/tutorial_bounding_rotated_ellipses.html

[2021-11-15-16:23] I have updated the description of the data set format in readme.md.

trungpham2606 commented 2 years ago

Hello @Shank2358 How can I calculate the area_ratio ?

Shank2358 commented 2 years ago

Hello @Shank2358 How can I calculate the area_ratio ?

Hi. I have added a script for generating datasets in ./datasets_tools/DOTA2Train.py, maybe you can try it. The Line 72-72 of DOTA2Train.py is used for calculating the area_ratio, which is the ratio of OBB's area and HBB's area Thank you.

trungpham2606 commented 2 years ago

@Shank2358 I see, I will try it.

trungpham2606 commented 2 years ago

@Shank2358 I see in the script, the angle is calculated directly from cv2, but you dont change its value as stated on the paper ? I mean: if angle in [pi/2, pi] -> angle = angle - pi/2 ?

Shank2358 commented 2 years ago

@Shank2358 I see in the script, the angle is calculated directly from cv2, but you dont change its value as stated on the paper ? I mean: if angle in [pi/2, pi] -> angle = angle - pi/2 ?

This angle transformation in paper has been done by the opencv function. so the output is (-pi/2,0] directly.

trungpham2606 commented 2 years ago

@Shank2358 After converting my custom dataset to the format that GGHL needs, I can train but met this issue: image

Shank2358 commented 2 years ago

@Shank2358 After converting my custom dataset to the format that GGHL needs, I can train but met this issue: image

Hi. I guess the following potential problems may lead to Nan. 1) You'd better to check whether the converted data is correct. It would be a good idea to visualize them. The correct results should be like this in the paper. image

2) Model initialization parameters need to be reset. Maybe you can try our pre training weight (trained on ImageNet), which will make the convergence more stable. The link of pre-trained weight is as follows. Baidu_Disk(password:0blv) Google_Drive

3) If the pre-trained weight is not used, the parameter initialization method may need to be adjusted. Our default initialization is the initialization with the mean value of 0 and the variance of 0.01. Maybe you can try Xavier initialization or Kaiming initialization.

4) Maybe you can check whether the denominator or log is 0, which will also lead to NaN

5) Training hyper-parameters such as learning rate may also need to be readjusted if you train your own dataset.

As the only information available to me is this screenshot, I can only infer from experience that Nan may be caused by the above reasons. I guess it is a higher probability of data conversion and pre-trained weight. Try them first. If you still have problems, please leave me a message or e-mail. I will try my best to help you solve this problem.

Thank you.

trungpham2606 commented 2 years ago

@Shank2358 1.How can I draw the heatmap image as yours in paper ? 2.I download the Imagenet pretrained weights, and now it can train (without Nan loss).

  1. I will try other approaches if the Nan loss still exists.
  2. Thank you!
Shank2358 commented 2 years ago

@Shank2358 1.How can I draw the heatmap image as yours in paper ? 2.I download the Imagenet pretrained weights, and now it can train (without Nan loss). 3. I will try other approaches if the Nan loss still exists. 4. Thank you!

  1. Just use Matplotlib to display (label_sbbox, label_mbbox, label_lbbox). For example, add the following codes after Line 34 in datasets_obb.py

        import matplotlib.pyplot as plt
        img = np.uint8(np.transpose(img, (1, 2, 0)) * 255)
        plt.figure("img") 
        plt.imshow(img)
    
        mask_s = np.max(label_sbbox[:, :, 16:], -1, keepdims=True)
        plt.figure("mask_s")
        plt.imshow(mask_s, cmap='jet')
    
        mask_m = np.max(label_mbbox[:, :, 16:], -1, keepdims=True)
        plt.figure("mask_m")
        plt.imshow(mask_m, cmap='jet')
    
        mask_l = np.max(label_lbbox[:, :, 16:], -1, keepdims=True)
        plt.figure("mask_l") 
        plt.imshow(mask_l, cmap='jet')
    
        plt.show()

    By the way, datasets_ obb.py can be run independently (I wrote the main function). You can run it when checking data and visualization.

  2. Congratulations! 🎉🎉🎉

  3. Please train a few more epoches to see if Nan will appear again.

  4. You are welcome.

trungpham2606 commented 2 years ago

@Shank2358 Do you think it's normal with my heatmaps: image pretty weird that the mask_s is like empty.

Shank2358 commented 2 years ago

@Shank2358 Do you think it's normal with my heatmaps: image pretty weird that the mask_s is like empty.

It seems OK. When there are no small objects, mask_s is indeed empty. The paper explains that objects of different sizes need to be assigned to different layers. When one layer is not assigned to the corresponding object, it is empty. image The scale hyper-parameter tau can be adjusted according to your dataset.

trungpham2606 commented 2 years ago

@Shank2358 So everything seems correct now. I will train on my custom dataset and comeback with the results. From the current training progress (after 30 epochs) the classification loss is not stable, it's pretty large (sometimes > 100). I will train more to see if the issue still exists. Thank you so much !

Shank2358 commented 2 years ago

@Shank2358 So everything seems correct now. I will train on my custom dataset and comeback with the results. From the current training progress (after 30 epochs) the classification loss is not stable, it's pretty large (sometimes > 100). I will train more to see if the issue still exists. Thank you so much !

image

The loss_cls is not add to the total loss, we use the loss_pos and loss_neg instead. Maybe you can try to add this for stable training in the early stage. I have modified the loss_jol.py, please update it.

trungpham2606 commented 2 years ago

@Shank2358 I think 1 problem when training with custom dataset is that some rotate bounding boxes' coordinates will be outside of the image's dimension (my custom dataset has mask and I convert to GGHL format). We have to add padding to image or just simply ignore them. I will try ignore them first.

I have double checked the data, the annotation was correct, but the model doesnt converge. the mAP is always zero :O

Shank2358 commented 2 years ago

@Shank2358 I think 1 problem when training with custom dataset is that some rotate bounding boxes' coordinates will be outside of the image's dimension (my custom dataset has mask and I convert to GGHL format). We have to add padding to image or just simply ignore them. I will try ignore them first.

I have double checked the data, the annotation was correct, but the model doesnt converge. the mAP is always zero :O

1) Have you checked the order of vertices? The order of p1-p2-p3-p4 is as in the paper. I think this problem is most likely. 😥😥 Recently, I am rewriting the code of label conversion. The vertices in the labels used here are sorted. Maybe your order is inconsistent with our definition. I will rewrite a code that can sort automatically and update it in two days. 🤖 2) Is loss updated? I updated it yesterday. 3) Can the assigned Gaussian heatmap correspond to the original image? 4) Do all losses not converge or do a part of losses not converge? 5) I will run this code again with other datasets, and then give you feedback. 6) We set it to 70 epochs before calculating the mAP, so the mAP will be displayed as 0 before that. You can modify train.py to set it.

Shank2358 commented 2 years ago

@Shank2358 I think 1 problem when training with custom dataset is that some rotate bounding boxes' coordinates will be outside of the image's dimension (my custom dataset has mask and I convert to GGHL format). We have to add padding to image or just simply ignore them. I will try ignore them first.

I have double checked the data, the annotation was correct, but the model doesnt converge. the mAP is always zero :O

I cloned this code and trained GGHL for 30 epochs on the HRSC2016 dataset. The model can converge. I have uploaded the training log to the log folder. image

The following is a visual Gaussian heatmap. image

I also tested the mAP and visual detection results and showed that everything is normal. Although the mAP is not very high because the training rounds are not complete,it shows that the code can work. I will continue to complete the training and update the final results. It may take some time. image

Therefore, I think there seems to be no problem with the code. It is more likely to be the problem of label conversion (see the previous reply for details). I will continue to check the code and help you solve this problem.

Thank you.

trungpham2606 commented 2 years ago

@Shank2358 I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner. If you want to have a look, please give me your mail, I will send it to you. Thank you!

Shank2358 commented 2 years ago

@Shank2358 I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner. If you want to have a look, please give me your mail, I will send it to you. Thank you!

Of course. It's my pleasure. My email is zhanchao.h@outlook.com

Shank2358 commented 2 years ago

@Shank2358 I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner. If you want to have a look, please give me your mail, I will send it to you. Thank you!

This is the complete test results of HRSC2016 dataset image

trungpham2606 commented 2 years ago

@Shank2358 can you show some of the detections as well ^^. btw, I sent to you my dataset.

Shank2358 commented 2 years ago

@Shank2358 can you show some of the detections as well ^^. btw, I sent to you my dataset.

This is the result when confidence threshold = 0.3 image image

trungpham2606 commented 2 years ago

@Shank2358 look great :O !!!!!

Shank2358 commented 2 years ago

@Shank2358 look great :O !!!!!

Sure enough, the problem is caused by the wrong order of the points in the label. The following figure shows the visual label, and the red is the starting point. In GGHL, we define that the vertex on the top edge of HBB is as the starting point, and the order is clockwise. Your problem seems to be here. image

In addition, some quadrilaterals seem to be nonconvex, such as the quadrilateral with a part cut above.

Maybe we need some time to rewrite the label conversion script.

trungpham2606 commented 2 years ago

@Shank2358

  1. for the nonconvex problem, I think it was from your augment ? because I did not label the object which is not fully visible.
  2. for the points order, seems I have to re-label the images, right ?
Shank2358 commented 2 years ago

@Shank2358

  1. for the nonconvex problem, I think it was from your augment ? because I did not label the object which is not fully visible.
  2. for the points order, seems I have to re-label the images, right ?
  1. You're right. Beacuse of the data agumentation.
  2. Maybe just modify the script again. I'll try to modify it tomorrow.
trungpham2606 commented 2 years ago

@Shank2358

  1. for the augmentation, we should ignore some objects after augmentation if the box iou < 80%.
  2. i see. because a part of the my script was taken from your script. Quite surprise that the order is not correct.
Shank2358 commented 2 years ago

@Shank2358

  1. for the augmentation, we should ignore some objects after augmentation if the box iou < 80%.
  2. i see. because a part of the my script was taken from your script. Quite surprise that the order is not correct.

This time the label seems to be correct. I modified your label conversion script. I sent you the correct code via email, please check it. image

The following is the result of training for 50 epochs, which seems to be correct. There are too few samples for training GGHL (only 11 images) on your datasets, it didn't look so good. I did not fine-tune the parameters. The rest of the work is left to you. Have fun. image

The label conversion script in the GitHub repository has also been updated.

Thank you.

trungpham2606 commented 2 years ago

Thank you @Shank2358

Shank2358 commented 2 years ago

@Shank2358 I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner. If you want to have a look, please give me your mail, I will send it to you. Thank you!

Of course. It's my pleasure. My email is zhanchao.h@outlook.com

I cannot concat you by this email. Can you provide another?

zhanchao.huang@ieee.org Please try it. Thank you.

Fly-dream12 commented 2 years ago

sorry it doesn't work

trungpham2606 commented 2 years ago

@Shank2358 Can I see your loss during training my dataset ?

Shank2358 commented 2 years ago

@Shank2358 Can I see your loss during training my dataset ?

Sorry the training log was deleted. This is some data from the screenshot at that time.

Epoch:[ 21/51] Batch:[ 0/1] Img_size:[800] Loss:58.6973 Loss_fg:4.1182 | Loss_bg:3.2572 | Loss_pos:20.9150 | Loss_neg:4.6449 | Loss_iou:3.8027 | Loss_cls:14.4501 | Loss_s:1.7436 | Loss_r:3.1436 | Loss_l:2.6220 | LR:0.000134864

[2021-11-20 02:07:32,153]-[train_GGHL.py line:203]: Epoch:[ 46/51] Batch:[ 0/1] Img_size:[800] Loss:44.5443 Loss_fg:3.3079 | Loss_bg:1.6776 | Loss_pos:15.9689 | Loss_neg:3.5604 | Loss_iou:3.8235 | Loss_cls:9.9970 | Loss_s:1.8301 | Loss_r:2.0814 | Loss_l:2.2976 | LR:6.06895e-06

I think your dataset is too small to support the parameter update of such a large model. Maybe you can replace the backbone with a lighter one, such as resnet18.

Shank2358 commented 2 years ago

sorry it doesn't work

Sorry. Both of my mailboxes can work normally. I just sent a few work emails. Or maybe you can describe your problem here. Thanks.

trungpham2606 commented 2 years ago

@Shank2358 Can I see your loss during training my dataset ?

Sorry the training log was deleted. This is some data from the screenshot at that time.

Epoch:[ 21/51] Batch:[ 0/1] Img_size:[800] Loss:58.6973 Loss_fg:4.1182 | Loss_bg:3.2572 | Loss_pos:20.9150 | Loss_neg:4.6449 | Loss_iou:3.8027 | Loss_cls:14.4501 | Loss_s:1.7436 | Loss_r:3.1436 | Loss_l:2.6220 | LR:0.000134864

[2021-11-20 02:07:32,153]-[train_GGHL.py line:203]: Epoch:[ 46/51] Batch:[ 0/1] Img_size:[800] Loss:44.5443 Loss_fg:3.3079 | Loss_bg:1.6776 | Loss_pos:15.9689 | Loss_neg:3.5604 | Loss_iou:3.8235 | Loss_cls:9.9970 | Loss_s:1.8301 | Loss_r:2.0814 | Loss_l:2.2976 | LR:6.06895e-06

I think your dataset is too small to support the parameter update of such a large model. Maybe you can replace the backbone with a lighter one, such as resnet18.

  1. Thank you for the info.
  2. Iam training a new one, more challenging than the toy set I sent to you.
  3. Here are some of them (images + labels) image image
Shank2358 commented 2 years ago

@Shank2358 Can I see your loss during training my dataset ?

Sorry the training log was deleted. This is some data from the screenshot at that time. Epoch:[ 21/51] Batch:[ 0/1] Img_size:[800] Loss:58.6973 Loss_fg:4.1182 | Loss_bg:3.2572 | Loss_pos:20.9150 | Loss_neg:4.6449 | Loss_iou:3.8027 | Loss_cls:14.4501 | Loss_s:1.7436 | Loss_r:3.1436 | Loss_l:2.6220 | LR:0.000134864 [2021-11-20 02:07:32,153]-[train_GGHL.py line:203]: Epoch:[ 46/51] Batch:[ 0/1] Img_size:[800] Loss:44.5443 Loss_fg:3.3079 | Loss_bg:1.6776 | Loss_pos:15.9689 | Loss_neg:3.5604 | Loss_iou:3.8235 | Loss_cls:9.9970 | Loss_s:1.8301 | Loss_r:2.0814 | Loss_l:2.2976 | LR:6.06895e-06 I think your dataset is too small to support the parameter update of such a large model. Maybe you can replace the backbone with a lighter one, such as resnet18.

  1. Thank you for the info.
  2. Iam training a new one, more challenging than the toy set I sent to you.

Cool!Thanks for sharing.

Fly-dream12 commented 2 years ago

@Shank2358 I have a small dataset (with coco format and I converted it to GGHL format), I think you can test it, the training will be fast and we can see the results sooner. If you want to have a look, please give me your mail, I will send it to you. Thank you!

Of course. It's my pleasure. My email is zhanchao.h@outlook.com

I cannot concat you by this email. Can you provide another?

zhanchao.huang@ieee.org Please try it. Thank you.

can you change another email address? Thanks

trungpham2606 commented 2 years ago

@Shank2358 After training so many epochs, during testing it cant detect anything. maybe the approach is not suitable with this kind of dataset.

Shank2358 commented 2 years ago

@Shank2358 After training so many epochs, during testing it cant detect anything. maybe the approach is not suitable with this kind of dataset.

How many training samples do you have?

@Shank2358 I have 185 images.

Can you show me the loss, please? I didn't write a drawing function in evaluator.py. Need to draw the test in predictionR/voc. Is this the reason? Is nothing detected? The data I ran the day before yesterday can detect objects. In addition, has the object category in config been modified?

trungpham2606 commented 2 years ago

@Shank2358 I have 185 images.

Shank2358 commented 2 years ago

Can you show me the loss, please? I didn't write a drawing function in evaluator.py. Need to draw the test in predictionR/voc. Is this the reason? Is nothing detected? The data I ran the day before yesterday can detect objects. In addition, has the object category in config been modified?

Can you show me the loss, please? I didn't write a drawing function in evaluator.py. Need to draw the test in predictionR/voc. Is this the reason? Is nothing detected? The data I ran the day before yesterday can detect objects. In addition, has the object category in config been modified?

This is the result visualization code, I commented it out? Did you uncomment it? image

trungpham2606 commented 2 years ago

@Shank2358 Everything was checked. If I set the confident score =0.1 (small) then it will need like too much time for calculating the nms (because so many rois), if I set higher score, then it detect nothing. I didnt uncomment it. here is the loss [2021-11-21 13:10:57,104]-[train_GGHL.py line:169]: Epoch:[243/501] Batch:[ 75/92] Img_size:[736] Loss:12.4321 Loss_fg:0.8209 | Loss_bg:0.7159 | Loss_pos:6.2062 | Loss_neg:0.0000 | Loss_iou:1.2030 | Loss_cls:77.7122 | Loss_s:0.7767 | Loss_r:0.7240 | Loss_l:1.9852 | LR:5.33808e-05

Shank2358 commented 2 years ago

@Shank2358 Everything was checked. If I set the confident score =0.1 (small) then it will need like too much time for calculating the nms (because so many rois), if I set higher score, then it detect nothing. I didnt uncomment it. here is the loss [2021-11-21 13:10:57,104]-[train_GGHL.py line:169]: Epoch:[243/501] Batch:[ 75/92] Img_size:[736] Loss:12.4321 Loss_fg:0.8209 | Loss_bg:0.7159 | Loss_pos:6.2062 | Loss_neg:0.0000 | Loss_iou:1.2030 | Loss_cls:77.7122 | Loss_s:0.7767 | Loss_r:0.7240 | Loss_l:1.9852 | LR:5.33808e-05

It seems to be a classification problem, cls_loss is very high, because the scores = confidence * class_scores. The bbox regression of the model seems to have converged. Try to remove the classification score in the evaluator and only take the confidence score, can you draw something. image scores = pred_conf and see if there is any result.

If there are results in this way, then it is a classification problem. Then continue to troubleshoot the bug. Maybe there is a problem with the loss of the category, or maybe the id of the category is not correct?

There are two bugs in the loss function. I changed it the day before yesterday. Have you updated it? image image

Shank2358 commented 2 years ago

@Shank2358 Everything was checked. If I set the confident score =0.1 (small) then it will need like too much time for calculating the nms (because so many rois), if I set higher score, then it detect nothing. I didnt uncomment it. here is the loss [2021-11-21 13:10:57,104]-[train_GGHL.py line:169]: Epoch:[243/501] Batch:[ 75/92] Img_size:[736] Loss:12.4321 Loss_fg:0.8209 | Loss_bg:0.7159 | Loss_pos:6.2062 | Loss_neg:0.0000 | Loss_iou:1.2030 | Loss_cls:77.7122 | Loss_s:0.7767 | Loss_r:0.7240 | Loss_l:1.9852 | LR:5.33808e-05

In addition, 500 epochs seem to be too much, and the model may be over-fitting.

trungpham2606 commented 2 years ago

@Shank2358 Actually It was trained 243/500, not 500 (I stopped it). Iam ignoring the classes as you suggested and It output some results. I will draw the rotate boxes to see the results clearly.

trungpham2606 commented 2 years ago

Oh i didnt update it :((. I will retrain to see.