Open QY1994-0919 opened 1 year ago
I made a few changes to the diffusiondet model, which requires an image input size of (224,224). I changed dataset_mapper.py. However, the following error is reported after training an epoch.
[04/14 03:54:18] d2.utils.events INFO: eta: 1 day, 14:23:45 iter: 7299 total_loss: 18.96 loss_ce: 1.49 loss_bbox: 0.2864 loss_giou: 1.294 loss_ce_0: 1.533 loss_bbox_0: 0.2933 loss_giou_0: 1.294 loss_ce_1: 1.476 loss_bbox_1: 0.3044 loss_giou_1: 1.324 loss_ce_2: 1.473 loss_bbox_2: 0.2846 loss_giou_2: 1.302 loss_ce_3: 1.496 loss_bbox_3: 0.313 loss_giou_3: 1.304 loss_ce_4: 1.502 loss_bbox_4: 0.2977 loss_giou_4: 1.292 time: 0.3130 data_time: 0.0045 lr: 2.5e-05 max_mem: 5393M [04/14 03:54:24] d2.utils.events INFO: eta: 1 day, 14:20:12 iter: 7319 total_loss: 19.09 loss_ce: 1.541 loss_bbox: 0.2998 loss_giou: 1.275 loss_ce_0: 1.574 loss_bbox_0: 0.2834 loss_giou_0: 1.326 loss_ce_1: 1.505 loss_bbox_1: 0.2918 loss_giou_1: 1.295 loss_ce_2: 1.536 loss_bbox_2: 0.2774 loss_giou_2: 1.32 loss_ce_3: 1.564 loss_bbox_3: 0.2937 loss_giou_3: 1.327 loss_ce_4: 1.519 loss_bbox_4: 0.3095 loss_giou_4: 1.311 time: 0.3130 data_time: 0.0053 lr: 2.5e-05 max_mem: 5393M [04/14 03:54:31] d2.data.datasets.coco INFO: Loading datasets/coco/annotations/instances_val2017.json takes 3.23 seconds. [04/14 03:54:31] d2.data.datasets.coco INFO: Loaded 5000 images in COCO format from datasets/coco/annotations/instances_val2017.json [04/14 03:54:31] d2.data.build INFO: Distribution of instances among all 80 categories: ...........................................................
| hair drier | 11 | toothbrush | 57 | | | | total | 36335 | | | | | [04/14 04:34:48] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [04/14 04:34:48] d2.data.common INFO: Serializing 5000 elements to byte tensors and concatenating them all ... [04/14 04:34:48] d2.data.common INFO: Serialized dataset takes 19.10 MiB [04/14 04:34:48] d2.evaluation.coco_evaluation WARNING: COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead. [04/14 04:34:49] d2.evaluation.evaluator INFO: Start inference on 2500 batches [04/14 04:34:54] d2.engine.train_loop ERROR: Exception during training: Traceback (most recent call last): File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 150, in train self.after_step() File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 180, in after_step h.after_step() File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 552, in after_step self._do_eval() File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 525, in _do_eval results = self._func() File "/opt/data/private/Models_ours/maediff_pfpn/train_net.py", line 241, in test_and_save_results self._last_eval_results = self.test(self.cfg, self.model) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 608, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/evaluation/evaluator.py", line 158, in inference_on_dataset outputs = model(inputs) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 799, in forward output = self.module(*inputs[0], *kwargs[0]) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/opt/data/private/Models_ours/maediff_pfpn/diffusiondet/detector.py", line 305, in forward src = self.backbone(images.tensor) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/opt/data/private/Models_ours/maediff_pfpn/diffusiondet/maepfpn.py", line 353, in forward latent, mask, ids_restore = self.forward_encoder(imgs, mask_ratio) File "/opt/data/private/Models_ours/maediff_pfpn/diffusiondet/maepfpn.py", line 280, in forward_encoder x = x + self.pos_embed[:, 1:, :] RuntimeError: The size of tensor a (3800) must match the size of tensor b (196) at non-singleton dimension 1 [04/14 04:34:54] d2.engine.hooks INFO: Overall training speed: 7327 iterations in 0:38:19 (0.3139 s / it) [04/14 04:34:54] d2.engine.hooks INFO: Total training time: 0:38:44 (0:00:24 on hooks) [04/14 04:34:54] d2.utils.events INFO: eta: 1 day, 14:05:56 iter: 7329 total_loss: 18.51 loss_ce: 1.54 loss_bbox: 0.276 loss_giou: 1.274 loss_ce_0: 1.601 loss_bbox_0: 0.2701 loss_giou_0: 1.259 loss_ce_1: 1.544 loss_bbox_1: 0.2806 loss_giou_1: 1.204 loss_ce_2: 1.562 loss_bbox_2: 0.2677 loss_giou_2: 1.207 loss_ce_3: 1.571 loss_bbox_3: 0.2608 loss_giou_3: 1.226 loss_ce_4: 1.564 loss_bbox_4: 0.2737 loss_giou_4: 1.262 time: 0.3138 data_time: 0.0050 lr: 2.5e-05 max_mem: 5393M
I made a few changes to the diffusiondet model, which requires an image input size of (224,224). I changed dataset_mapper.py. However, the following error is reported after training an epoch.
[04/14 03:54:18] d2.utils.events INFO: eta: 1 day, 14:23:45 iter: 7299 total_loss: 18.96 loss_ce: 1.49 loss_bbox: 0.2864 loss_giou: 1.294 loss_ce_0: 1.533 loss_bbox_0: 0.2933 loss_giou_0: 1.294 loss_ce_1: 1.476 loss_bbox_1: 0.3044 loss_giou_1: 1.324 loss_ce_2: 1.473 loss_bbox_2: 0.2846 loss_giou_2: 1.302 loss_ce_3: 1.496 loss_bbox_3: 0.313 loss_giou_3: 1.304 loss_ce_4: 1.502 loss_bbox_4: 0.2977 loss_giou_4: 1.292 time: 0.3130 data_time: 0.0045 lr: 2.5e-05 max_mem: 5393M [04/14 03:54:24] d2.utils.events INFO: eta: 1 day, 14:20:12 iter: 7319 total_loss: 19.09 loss_ce: 1.541 loss_bbox: 0.2998 loss_giou: 1.275 loss_ce_0: 1.574 loss_bbox_0: 0.2834 loss_giou_0: 1.326 loss_ce_1: 1.505 loss_bbox_1: 0.2918 loss_giou_1: 1.295 loss_ce_2: 1.536 loss_bbox_2: 0.2774 loss_giou_2: 1.32 loss_ce_3: 1.564 loss_bbox_3: 0.2937 loss_giou_3: 1.327 loss_ce_4: 1.519 loss_bbox_4: 0.3095 loss_giou_4: 1.311 time: 0.3130 data_time: 0.0053 lr: 2.5e-05 max_mem: 5393M [04/14 03:54:31] d2.data.datasets.coco INFO: Loading datasets/coco/annotations/instances_val2017.json takes 3.23 seconds. [04/14 03:54:31] d2.data.datasets.coco INFO: Loaded 5000 images in COCO format from datasets/coco/annotations/instances_val2017.json [04/14 03:54:31] d2.data.build INFO: Distribution of instances among all 80 categories: ...........................................................
| hair drier | 11 | toothbrush | 57 | | | | total | 36335 | | | | | [04/14 04:34:48] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [04/14 04:34:48] d2.data.common INFO: Serializing 5000 elements to byte tensors and concatenating them all ... [04/14 04:34:48] d2.data.common INFO: Serialized dataset takes 19.10 MiB [04/14 04:34:48] d2.evaluation.coco_evaluation WARNING: COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead. [04/14 04:34:49] d2.evaluation.evaluator INFO: Start inference on 2500 batches [04/14 04:34:54] d2.engine.train_loop ERROR: Exception during training: Traceback (most recent call last): File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 150, in train self.after_step() File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 180, in after_step h.after_step() File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 552, in after_step self._do_eval() File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 525, in _do_eval results = self._func() File "/opt/data/private/Models_ours/maediff_pfpn/train_net.py", line 241, in test_and_save_results self._last_eval_results = self.test(self.cfg, self.model) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 608, in test results_i = inference_on_dataset(model, data_loader, evaluator) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/detectron2/evaluation/evaluator.py", line 158, in inference_on_dataset outputs = model(inputs) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 799, in forward output = self.module(*inputs[0], *kwargs[0]) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/opt/data/private/Models_ours/maediff_pfpn/diffusiondet/detector.py", line 305, in forward src = self.backbone(images.tensor) File "/root/anaconda3/envs/torch19/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/opt/data/private/Models_ours/maediff_pfpn/diffusiondet/maepfpn.py", line 353, in forward latent, mask, ids_restore = self.forward_encoder(imgs, mask_ratio) File "/opt/data/private/Models_ours/maediff_pfpn/diffusiondet/maepfpn.py", line 280, in forward_encoder x = x + self.pos_embed[:, 1:, :]
RuntimeError: The size of tensor a (3800) must match the size of tensor b (196) at non-singleton dimension 1 [04/14 04:34:54] d2.engine.hooks INFO: Overall training speed: 7327 iterations in 0:38:19 (0.3139 s / it) [04/14 04:34:54] d2.engine.hooks INFO: Total training time: 0:38:44 (0:00:24 on hooks) [04/14 04:34:54] d2.utils.events INFO: eta: 1 day, 14:05:56 iter: 7329 total_loss: 18.51 loss_ce: 1.54 loss_bbox: 0.276 loss_giou: 1.274 loss_ce_0: 1.601 loss_bbox_0: 0.2701 loss_giou_0: 1.259 loss_ce_1: 1.544 loss_bbox_1: 0.2806 loss_giou_1: 1.204 loss_ce_2: 1.562 loss_bbox_2: 0.2677 loss_giou_2: 1.207 loss_ce_3: 1.571 loss_bbox_3: 0.2608 loss_giou_3: 1.226 loss_ce_4: 1.564 loss_bbox_4: 0.2737 loss_giou_4: 1.262 time: 0.3138 data_time: 0.0050 lr: 2.5e-05 max_mem: 5393M