SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.05k stars 86 forks source link

mmdetection on COCO2017 not converge #94

Closed jamesben6688 closed 12 months ago

jamesben6688 commented 1 year ago

Hi Ali, I tried your code on COCO2017 using mmdetection, but the training does not converge. I tried both cascade_mask_rcnn and mask_rcnn, but neither of them converges.

My environment:

python3.8
pytorch: 1.11.0+cu113
mmcv-full: 1.4.8
mmdet: 2.19.0

COCO directory:

+ annotations  
    + captions_train2017.json  
    + instances_train2017.json  
    + person_keypoints_train2017.json
    + captions_val2017.json    
    + instances_val2017.json   
    + person_keypoints_val2017.json
+ train2017  
+val2017

All raw files were downloaded from the COCO official website. The loss remains at a high level, and all the average precision (AP) values are zero.

Following is the first couple to iters:

2023-11-09 23:15:30,812 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2023-11-09 23:15:44,534 - mmdet - INFO - Epoch [1][50/7330]     lr: 9.890e-06, eta: 4 days, 10:26:14, time: 1.452, data_time: 1.204, memory: 10086, loss_rpn_cls: 0.5645, loss_rpn_bbox: 0.2489, loss_cls: 4.5210, acc: 0.0808, loss_bbox: 0.0397, loss_mask: 0.7171, loss: 6.0913
2023-11-09 23:15:57,922 - mmdet - INFO - Epoch [1][100/7330]    lr: 1.988e-05, eta: 2 days, 15:01:05, time: 0.268, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2420, loss_cls: 4.5210, acc: 0.0857, loss_bbox: 0.0413, loss_mask: 0.7138, loss: 6.0784
2023-11-09 23:16:11,390 - mmdet - INFO - Epoch [1][150/7330]    lr: 2.987e-05, eta: 2 days, 0:34:55, time: 0.269, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5621, loss_rpn_bbox: 0.2455, loss_cls: 4.5187, acc: 0.0850, loss_bbox: 0.0388, loss_mask: 0.7166, loss: 6.0818
2023-11-09 23:16:24,826 - mmdet - INFO - Epoch [1][200/7330]    lr: 3.986e-05, eta: 1 day, 17:20:59, time: 0.269, data_time: 0.028, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2423, loss_cls: 4.5181, acc: 0.0747, loss_bbox: 0.0390, loss_mask: 0.7185, loss: 6.0783
2023-11-09 23:16:38,230 - mmdet - INFO - Epoch [1][250/7330]    lr: 4.985e-05, eta: 1 day, 12:59:59, time: 0.268, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5616, loss_rpn_bbox: 0.2448, loss_cls: 4.5196, acc: 0.0811, loss_bbox: 0.0388, loss_mask: 0.7180, loss: 6.0826
2023-11-09 23:16:51,530 - mmdet - INFO - Epoch [1][300/7330]    lr: 5.984e-05, eta: 1 day, 10:04:23, time: 0.266, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5652, loss_rpn_bbox: 0.2486, loss_cls: 4.5217, acc: 0.0913, loss_bbox: 0.0417, loss_mask: 0.7145, loss: 6.0917
2023-11-09 23:17:05,057 - mmdet - INFO - Epoch [1][350/7330]    lr: 6.983e-05, eta: 1 day, 8:01:45, time: 0.271, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5610, loss_rpn_bbox: 0.2403, loss_cls: 4.5226, acc: 0.0725, loss_bbox: 0.0397, loss_mask: 0.7176, loss: 6.0812
2023-11-09 23:17:18,510 - mmdet - INFO - Epoch [1][400/7330]    lr: 7.982e-05, eta: 1 day, 6:28:54, time: 0.269, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5635, loss_rpn_bbox: 0.2402, loss_cls: 4.5204, acc: 0.0942, loss_bbox: 0.0403, loss_mask: 0.7168, loss: 6.0812
2023-11-09 23:17:32,048 - mmdet - INFO - Epoch [1][450/7330]    lr: 8.981e-05, eta: 1 day, 5:17:28, time: 0.271, data_time: 0.032, memory: 10086, loss_rpn_cls: 0.5633, loss_rpn_bbox: 0.2524, loss_cls: 4.5189, acc: 0.0957, loss_bbox: 0.0401, loss_mask: 0.7142, loss: 6.0889
2023-11-09 23:17:45,381 - mmdet - INFO - Epoch [1][500/7330]    lr: 9.980e-05, eta: 1 day, 4:18:28, time: 0.267, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5636, loss_rpn_bbox: 0.2419, loss_cls: 4.5199, acc: 0.0798, loss_bbox: 0.0374, loss_mask: 0.7165, loss: 6.0792
2023-11-09 23:17:58,847 - mmdet - INFO - Epoch [1][550/7330]    lr: 1.000e-04, eta: 1 day, 3:31:13, time: 0.269, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5648, loss_rpn_bbox: 0.2537, loss_cls: 4.5214, acc: 0.0837, loss_bbox: 0.0398, loss_mask: 0.7180, loss: 6.0977
2023-11-09 23:18:12,282 - mmdet - INFO - Epoch [1][600/7330]    lr: 1.000e-04, eta: 1 day, 2:51:35, time: 0.269, data_time: 0.026, memory: 10086, loss_rpn_cls: 0.5626, loss_rpn_bbox: 0.2425, loss_cls: 4.5211, acc: 0.0916, loss_bbox: 0.0390, loss_mask: 0.7159, loss: 6.0809
2023-11-09 23:18:26,248 - mmdet - INFO - Epoch [1][650/7330]    lr: 1.000e-04, eta: 1 day, 2:21:36, time: 0.279, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5682, loss_rpn_bbox: 0.2603, loss_cls: 4.5202, acc: 0.0918, loss_bbox: 0.0398, loss_mask: 0.7182, loss: 6.1067
2023-11-09 23:18:39,892 - mmdet - INFO - Epoch [1][700/7330]    lr: 1.000e-04, eta: 1 day, 1:53:50, time: 0.273, data_time: 0.026, memory: 10086, loss_rpn_cls: 0.5629, loss_rpn_bbox: 0.2412, loss_cls: 4.5175, acc: 0.0818, loss_bbox: 0.0388, loss_mask: 0.7152, loss: 6.0755
2023-11-09 23:18:53,903 - mmdet - INFO - Epoch [1][750/7330]    lr: 1.000e-04, eta: 1 day, 1:31:54, time: 0.280, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5668, loss_rpn_bbox: 0.2636, loss_cls: 4.5208, acc: 0.0779, loss_bbox: 0.0391, loss_mask: 0.7168, loss: 6.1071
2023-11-09 23:19:07,780 - mmdet - INFO - Epoch [1][800/7330]    lr: 1.000e-04, eta: 1 day, 1:11:56, time: 0.278, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5662, loss_rpn_bbox: 0.2581, loss_cls: 4.5178, acc: 0.0918, loss_bbox: 0.0401, loss_mask: 0.7177, loss: 6.0998
2023-11-09 23:19:21,543 - mmdet - INFO - Epoch [1][850/7330]    lr: 1.000e-04, eta: 1 day, 0:53:43, time: 0.275, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2394, loss_cls: 4.5216, acc: 0.0830, loss_bbox: 0.0397, loss_mask: 0.7149, loss: 6.0760
2023-11-09 23:19:35,609 - mmdet - INFO - Epoch [1][900/7330]    lr: 1.000e-04, eta: 1 day, 0:38:57, time: 0.281, data_time: 0.028, memory: 10086, loss_rpn_cls: 0.5678, loss_rpn_bbox: 0.2609, loss_cls: 4.5206, acc: 0.0815, loss_bbox: 0.0400, loss_mask: 0.7168, loss: 6.1061

This looks abnormal. Did you perform any preprocessing on the COCO dataset?

jamesben6688 commented 12 months ago

This seems to be due to different GPU card, cuda and pytorch version.