Closed jamesben6688 closed 12 months ago
Hi Ali, I tried your code on COCO2017 using mmdetection, but the training does not converge. I tried both cascade_mask_rcnn and mask_rcnn, but neither of them converges.
My environment:
python3.8 pytorch: 1.11.0+cu113 mmcv-full: 1.4.8 mmdet: 2.19.0
COCO directory:
+ annotations + captions_train2017.json + instances_train2017.json + person_keypoints_train2017.json + captions_val2017.json + instances_val2017.json + person_keypoints_val2017.json + train2017 +val2017
All raw files were downloaded from the COCO official website. The loss remains at a high level, and all the average precision (AP) values are zero.
Following is the first couple to iters:
2023-11-09 23:15:30,812 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration. 2023-11-09 23:15:44,534 - mmdet - INFO - Epoch [1][50/7330] lr: 9.890e-06, eta: 4 days, 10:26:14, time: 1.452, data_time: 1.204, memory: 10086, loss_rpn_cls: 0.5645, loss_rpn_bbox: 0.2489, loss_cls: 4.5210, acc: 0.0808, loss_bbox: 0.0397, loss_mask: 0.7171, loss: 6.0913 2023-11-09 23:15:57,922 - mmdet - INFO - Epoch [1][100/7330] lr: 1.988e-05, eta: 2 days, 15:01:05, time: 0.268, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2420, loss_cls: 4.5210, acc: 0.0857, loss_bbox: 0.0413, loss_mask: 0.7138, loss: 6.0784 2023-11-09 23:16:11,390 - mmdet - INFO - Epoch [1][150/7330] lr: 2.987e-05, eta: 2 days, 0:34:55, time: 0.269, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5621, loss_rpn_bbox: 0.2455, loss_cls: 4.5187, acc: 0.0850, loss_bbox: 0.0388, loss_mask: 0.7166, loss: 6.0818 2023-11-09 23:16:24,826 - mmdet - INFO - Epoch [1][200/7330] lr: 3.986e-05, eta: 1 day, 17:20:59, time: 0.269, data_time: 0.028, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2423, loss_cls: 4.5181, acc: 0.0747, loss_bbox: 0.0390, loss_mask: 0.7185, loss: 6.0783 2023-11-09 23:16:38,230 - mmdet - INFO - Epoch [1][250/7330] lr: 4.985e-05, eta: 1 day, 12:59:59, time: 0.268, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5616, loss_rpn_bbox: 0.2448, loss_cls: 4.5196, acc: 0.0811, loss_bbox: 0.0388, loss_mask: 0.7180, loss: 6.0826 2023-11-09 23:16:51,530 - mmdet - INFO - Epoch [1][300/7330] lr: 5.984e-05, eta: 1 day, 10:04:23, time: 0.266, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5652, loss_rpn_bbox: 0.2486, loss_cls: 4.5217, acc: 0.0913, loss_bbox: 0.0417, loss_mask: 0.7145, loss: 6.0917 2023-11-09 23:17:05,057 - mmdet - INFO - Epoch [1][350/7330] lr: 6.983e-05, eta: 1 day, 8:01:45, time: 0.271, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5610, loss_rpn_bbox: 0.2403, loss_cls: 4.5226, acc: 0.0725, loss_bbox: 0.0397, loss_mask: 0.7176, loss: 6.0812 2023-11-09 23:17:18,510 - mmdet - INFO - Epoch [1][400/7330] lr: 7.982e-05, eta: 1 day, 6:28:54, time: 0.269, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5635, loss_rpn_bbox: 0.2402, loss_cls: 4.5204, acc: 0.0942, loss_bbox: 0.0403, loss_mask: 0.7168, loss: 6.0812 2023-11-09 23:17:32,048 - mmdet - INFO - Epoch [1][450/7330] lr: 8.981e-05, eta: 1 day, 5:17:28, time: 0.271, data_time: 0.032, memory: 10086, loss_rpn_cls: 0.5633, loss_rpn_bbox: 0.2524, loss_cls: 4.5189, acc: 0.0957, loss_bbox: 0.0401, loss_mask: 0.7142, loss: 6.0889 2023-11-09 23:17:45,381 - mmdet - INFO - Epoch [1][500/7330] lr: 9.980e-05, eta: 1 day, 4:18:28, time: 0.267, data_time: 0.029, memory: 10086, loss_rpn_cls: 0.5636, loss_rpn_bbox: 0.2419, loss_cls: 4.5199, acc: 0.0798, loss_bbox: 0.0374, loss_mask: 0.7165, loss: 6.0792 2023-11-09 23:17:58,847 - mmdet - INFO - Epoch [1][550/7330] lr: 1.000e-04, eta: 1 day, 3:31:13, time: 0.269, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5648, loss_rpn_bbox: 0.2537, loss_cls: 4.5214, acc: 0.0837, loss_bbox: 0.0398, loss_mask: 0.7180, loss: 6.0977 2023-11-09 23:18:12,282 - mmdet - INFO - Epoch [1][600/7330] lr: 1.000e-04, eta: 1 day, 2:51:35, time: 0.269, data_time: 0.026, memory: 10086, loss_rpn_cls: 0.5626, loss_rpn_bbox: 0.2425, loss_cls: 4.5211, acc: 0.0916, loss_bbox: 0.0390, loss_mask: 0.7159, loss: 6.0809 2023-11-09 23:18:26,248 - mmdet - INFO - Epoch [1][650/7330] lr: 1.000e-04, eta: 1 day, 2:21:36, time: 0.279, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5682, loss_rpn_bbox: 0.2603, loss_cls: 4.5202, acc: 0.0918, loss_bbox: 0.0398, loss_mask: 0.7182, loss: 6.1067 2023-11-09 23:18:39,892 - mmdet - INFO - Epoch [1][700/7330] lr: 1.000e-04, eta: 1 day, 1:53:50, time: 0.273, data_time: 0.026, memory: 10086, loss_rpn_cls: 0.5629, loss_rpn_bbox: 0.2412, loss_cls: 4.5175, acc: 0.0818, loss_bbox: 0.0388, loss_mask: 0.7152, loss: 6.0755 2023-11-09 23:18:53,903 - mmdet - INFO - Epoch [1][750/7330] lr: 1.000e-04, eta: 1 day, 1:31:54, time: 0.280, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5668, loss_rpn_bbox: 0.2636, loss_cls: 4.5208, acc: 0.0779, loss_bbox: 0.0391, loss_mask: 0.7168, loss: 6.1071 2023-11-09 23:19:07,780 - mmdet - INFO - Epoch [1][800/7330] lr: 1.000e-04, eta: 1 day, 1:11:56, time: 0.278, data_time: 0.030, memory: 10086, loss_rpn_cls: 0.5662, loss_rpn_bbox: 0.2581, loss_cls: 4.5178, acc: 0.0918, loss_bbox: 0.0401, loss_mask: 0.7177, loss: 6.0998 2023-11-09 23:19:21,543 - mmdet - INFO - Epoch [1][850/7330] lr: 1.000e-04, eta: 1 day, 0:53:43, time: 0.275, data_time: 0.031, memory: 10086, loss_rpn_cls: 0.5604, loss_rpn_bbox: 0.2394, loss_cls: 4.5216, acc: 0.0830, loss_bbox: 0.0397, loss_mask: 0.7149, loss: 6.0760 2023-11-09 23:19:35,609 - mmdet - INFO - Epoch [1][900/7330] lr: 1.000e-04, eta: 1 day, 0:38:57, time: 0.281, data_time: 0.028, memory: 10086, loss_rpn_cls: 0.5678, loss_rpn_bbox: 0.2609, loss_cls: 4.5206, acc: 0.0815, loss_bbox: 0.0400, loss_mask: 0.7168, loss: 6.1061
This looks abnormal. Did you perform any preprocessing on the COCO dataset?
This seems to be due to different GPU card, cuda and pytorch version.
Hi Ali, I tried your code on COCO2017 using mmdetection, but the training does not converge. I tried both cascade_mask_rcnn and mask_rcnn, but neither of them converges.
My environment:
COCO directory:
All raw files were downloaded from the COCO official website. The loss remains at a high level, and all the average precision (AP) values are zero.
Following is the first couple to iters:
This looks abnormal. Did you perform any preprocessing on the COCO dataset?