Sense-X / Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
MIT License
950 stars 100 forks source link

将数据集改为VOC跑的时候报错找不到文件,我换回coco也是一样的,我在slurm集群上跑的,具体错误如下 #144

Open NewBeeMrz opened 2 months ago

NewBeeMrz commented 2 months ago

2024-06-10 17:07:33,221 - mmdet - INFO - workflow: [('train', 1)], max: 200 epochs 2024-06-10 17:07:33,221 - mmdet - INFO - Checkpoints will be saved to /public/home/2022050834/Co-DETR-main/work_dir by HardDiskBackend. /public/home/2022050834/Co-DETR-main/mmdet/models/utils/positional_encoding.py:81: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). dim_t = self.temperature**(2 * (dim_t // 2) / self.num_feats) /public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1646756402876/work/aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] /public/home/2022050834/Co-DETR-main/projects/models/transformer.py:185: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). dim_t = temperature ** (2 * (dim_t // 2) / num_pos_feats) 2024-06-10 17:07:37,693 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration. 2024-06-10 17:08:31,786 - mmdet - INFO - Epoch [1][50/3083] lr: 2.000e-04, eta: 8 days, 8:34:33, time: 1.171, data_time: 0.061, memory: 21537, enc_loss_cls: 1.8714, enc_loss_bbox: 1.6752, enc_loss_iou: 1.0689, loss_cls: 2.0328, loss_bbox: 2.2435, loss_iou: 1.4618, d0.loss_cls: 1.7326, d0.loss_bbox: 2.3419, d0.loss_iou: 1.5049, d1.loss_cls: 1.8378, d1.loss_bbox: 2.2975, d1.loss_iou: 1.4795, d2.loss_cls: 1.8999, d2.loss_bbox: 2.2758, d2.loss_iou: 1.4697, d3.loss_cls: 1.9900, d3.loss_bbox: 2.2633, d3.loss_iou: 1.4659, d4.loss_cls: 2.0331, d4.loss_bbox: 2.2532, d4.loss_iou: 1.4622, loss_rpn_cls: 2.3185, loss_rpn_bbox: 0.1394, loss_cls0: 2.6935, acc0: 96.0039, loss_bbox0: 0.8234, loss_cls1: 11.8590, loss_bbox1: 12.4002, loss_centerness1: 7.5802, loss_cls_aux0: 1.4213, loss_bbox_aux0: 1.5956, loss_iou_aux0: 0.7012, d0.loss_cls_aux0: 1.1383, d0.loss_bbox_aux0: 1.5688, d0.loss_iou_aux0: 0.6490, d1.loss_cls_aux0: 1.1607, d1.loss_bbox_aux0: 1.5725, d1.loss_iou_aux0: 0.6641, d2.loss_cls_aux0: 1.2270, d2.loss_bbox_aux0: 1.5780, d2.loss_iou_aux0: 0.6771, d3.loss_cls_aux0: 1.3087, d3.loss_bbox_aux0: 1.5836, d3.loss_iou_aux0: 0.6863, d4.loss_cls_aux0: 1.4003, d4.loss_bbox_aux0: 1.5899, d4.loss_iou_aux0: 0.6947, loss_cls_aux1: 1.4534, loss_bbox_aux1: 1.5838, loss_iou_aux1: 1.0461, d0.loss_cls_aux1: 1.1912, d0.loss_bbox_aux1: 1.5624, d0.loss_iou_aux1: 1.0404, d1.loss_cls_aux1: 1.2195, d1.loss_bbox_aux1: 1.5624, d1.loss_iou_aux1: 1.0385, d2.loss_cls_aux1: 1.2785, d2.loss_bbox_aux1: 1.5666, d2.loss_iou_aux1: 1.0393, d3.loss_cls_aux1: 1.3580, d3.loss_bbox_aux1: 1.5718, d3.loss_iou_aux1: 1.0411, d4.loss_cls_aux1: 1.4396, d4.loss_bbox_aux1: 1.5780, d4.loss_iou_aux1: 1.0437, loss: 121.3068, grad_norm: 109.1845 2024-06-10 17:09:28,144 - mmdet - INFO - Epoch [1][100/3083] lr: 2.000e-04, eta: 8 days, 4:47:34, time: 1.127, data_time: 0.008, memory: 21605, enc_loss_cls: 1.6616, enc_loss_bbox: 1.1342, enc_loss_iou: 0.9087, loss_cls: 1.7328, loss_bbox: 2.2028, loss_iou: 1.2879, d0.loss_cls: 1.5783, d0.loss_bbox: 2.3775, d0.loss_iou: 1.2864, d1.loss_cls: 1.6770, d1.loss_bbox: 2.2770, d1.loss_iou: 1.2747, d2.loss_cls: 1.6984, d2.loss_bbox: 2.2375, d2.loss_iou: 1.2752, d3.loss_cls: 1.7063, d3.loss_bbox: 2.2068, d3.loss_iou: 1.2825, d4.loss_cls: 1.7099, d4.loss_bbox: 2.2027, d4.loss_iou: 1.2869, loss_rpn_cls: 0.9682, loss_rpn_bbox: 0.1277, loss_cls0: 2.5987, acc0: 96.5200, loss_bbox0: 1.5747, loss_cls1: 10.2182, loss_bbox1: 12.1025, loss_centerness1: 7.5403, loss_cls_aux0: 1.0521, loss_bbox_aux0: 1.4871, loss_iou_aux0: 0.7284, d0.loss_cls_aux0: 0.9023, d0.loss_bbox_aux0: 1.4935, d0.loss_iou_aux0: 0.7160, d1.loss_cls_aux0: 0.9116, d1.loss_bbox_aux0: 1.4914, d1.loss_iou_aux0: 0.7172, d2.loss_cls_aux0: 0.9228, d2.loss_bbox_aux0: 1.4903, d2.loss_iou_aux0: 0.7194, d3.loss_cls_aux0: 0.9510, d3.loss_bbox_aux0: 1.4891, d3.loss_iou_aux0: 0.7232, d4.loss_cls_aux0: 1.0100, d4.loss_bbox_aux0: 1.4869, d4.loss_iou_aux0: 0.7262, loss_cls_aux1: 1.0762, loss_bbox_aux1: 1.4314, loss_iou_aux1: 1.0411, d0.loss_cls_aux1: 0.8944, d0.loss_bbox_aux1: 1.4426, d0.loss_iou_aux1: 1.0466, d1.loss_cls_aux1: 0.9088, d1.loss_bbox_aux1: 1.4383, d1.loss_iou_aux1: 1.0447, d2.loss_cls_aux1: 0.9231, d2.loss_bbox_aux1: 1.4369, d2.loss_iou_aux1: 1.0431, d3.loss_cls_aux1: 0.9531, d3.loss_bbox_aux1: 1.4337, d3.loss_iou_aux1: 1.0415, d4.loss_cls_aux1: 1.0155, d4.loss_bbox_aux1: 1.4318, d4.loss_iou_aux1: 1.0409, loss: 109.7976, grad_norm: 79.5242 Traceback (most recent call last): File "/public/home/2022050834/Co-DETR-main/tools/train.py", line 253, in <module> main() File "/public/home/2022050834/Co-DETR-main/tools/train.py", line 242, in main train_detector( File "/public/home/2022050834/Co-DETR-main/mmdet/apis/train.py", line 245, in train_detector runner.run(data_loaders, cfg.workflow) File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run epoch_runner(data_loaders[i], **kwargs) File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 49, in train for i, data_batch in enumerate(self.data_loader): File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in __next__ data = self._next_data() File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1207, in _next_data idx, data = self._get_data() File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1173SS, in _get_data success, data = self._try_get_data() File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1011, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/multiprocessing/queues.py", line 122, in get return _ForkingPickler.loads(res) File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 295, in rebuild_storage_fd fd = df.detach() File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/multiprocessing/resource_sharer.py", line 86, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/multiprocessing/connection.py", line 502, in Client c = SocketClient(address) File "/public/home/2022050834/.conda/envs/pytorch1_11/lib/python3.9/multiprocessing/connection.py", line 630, in SocketClient s.connect(address) FileNotFoundError: [Errno 2] No such file or directory srun: error: gpu03: task 0: Exited with exit code 1 说是找不到文件,我查询资料大都说通信的问题,但是我无法解决。

NewBeeMrz commented 2 months ago

voc0712配置如下: image image

TempleX98 commented 2 months ago

这个应该是通信的问题,只依靠目前的信息我也无法解决这个问题