wangzhaoyang-508 commented 1 year ago

I use my custom dataset to train, but when i finish a eval (not first eval ,but the maybe 7th) an "AssertionError" occur : AssertionError: A prediction has class=6, but the dataset only has 5 classes and predicted class id should be in 0, 4].

I checked my train.json and test.josn it only has 5 classes，Why are there six categories of predictions？ How can i fix it？

The log is:

eta: 21:01:51 iter: 13949 total_loss: 12.1 loss_class: 0.1947 loss_bbox: 004829 loss_giou: 0.8478 loss_class_0: 0.2712 loss_bbox_0: 0.05169 loss_giou_0: 0.8324 loss_class_1: 0.2247 oss_bbox_1: 0.05147 loss_giou_1: 0.8432 loss_class_2: 0.2089 loss_bbox_2: 0.05045 loss_giou_2: 0.8478 loss_cass_3: 0.1921 loss_bbox_3: 0.04827 loss_giou_3: 0.8465 loss_class_4: 0.1926 loss_bbox_4: 0.04828 loss_giou_4 0.8471 loss_class_enc: 0.2915 loss_bbox_enc: 0.05811 loss_giou_enc: 0.9053 loss_class_dn: 0.01086 loss_bboxdn: 0.03106 loss_giou_dn: 0.6482 loss_class_dn_0: 0.0476 loss_bbox_dn_0: 0.04128 loss_giou_dn_0: 0.7782 lossclass_dn_1: 0.01995 loss_bbox_dn_1: 0.03258 loss_giou_dn_1: 0.6654 loss_class_dn_2: 0.01299 loss_bbox_dn_2: 003104 loss_giou_dn_2: 0.6437 loss_class_dn_3: 0.01152 loss_bbox_dn_3: 0.03102 loss_giou_dn_3: 0.6449 loss_clss_dn_4: 0.01137 loss_bbox_dn_4: 0.03105 loss_giou_dn_4: 0.6464 time: 0.7148 data_time: 0.0094 lr: 0.0001 mx_mem: 24711M [02/26 23:01:59 detectron2]: Run evaluation without EMA. WARNING [02/26 23:01:59 d2.data.datasets.coco]: Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.

[02/26 23:01:59 d2.data.datasets.coco]: Loaded 1653 images in COCO format from /data0/wangzhaoyang/data/smy/COCO/nnotations/instances_test2017.json [02/26 23:01:59 d2.data.common]: Serializing 1653 elements to byte tensors and concatenating them all ... [02/26 23:01:59 d2.data.common]: Serialized dataset takes 0.36 MiB [02/26 23:01:59 d2.evaluation.evaluator]: Start inference on 414 batches [02/26 23:02:09 d2.evaluation.evaluator]: Inference done 11/414. Dataloading: 0.0008 s/iter. Inference: 0.0722 s/ter. Eval: 0.0005 s/iter. Total: 0.0735 s/iter. ETA=0:00:29 [02/26 23:02:14 d2.evaluation.evaluator]: Inference done 82/414. Dataloading: 0.0011 s/iter. Inference: 0.0691 s/ter. Eval: 0.0005 s/iter. Total: 0.0708 s/iter. ETA=0:00:23 [02/26 23:02:19 d2.evaluation.evaluator]: Inference done 155/414. Dataloading: 0.0011 s/iter. Inference: 0.0685 siter. Eval: 0.0005 s/iter. Total: 0.0701 s/iter. ETA=0:00:18 [02/26 23:02:24 d2.evaluation.evaluator]: Inference done 225/414. Dataloading: 0.0011 s/iter. Inference: 0.0691 siter. Eval: 0.0005 s/iter. Total: 0.0707 s/iter. ETA=0:00:13 [02/26 23:02:29 d2.evaluation.evaluator]: Inference done 297/414. Dataloading: 0.0011 s/iter. Inference: 0.0688 siter. Eval: 0.0005 s/iter. Total: 0.0704 s/iter. ETA=0:00:08 [02/26 23:02:34 d2.evaluation.evaluator]: Inference done 366/414. Dataloading: 0.0011 s/iter. Inference: 0.0688 siter. Eval: 0.0010 s/iter. Total: 0.0709 s/iter. ETA=0:00:03 [02/26 23:02:38 d2.evaluation.evaluator]: Total inference time: 0:00:29.381724 (0.071838 s / iter per device, on devices) [02/26 23:02:38 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:28 (0.068535 s / iter per devic, on 4 devices) [02/26 23:02:42 d2.evaluation.coco_evaluation]: Preparing results for COCO format ... ERROR [02/26 23:02:42 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/train_loop.py", line 150, in train self.after_step() File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/train_loop.py", line 180, in after_step h.after_step() File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/hooks.py", line 555, in after_step self._do_eval() File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/hooks.py", line 528, in _do_eval results = self._func() File "/data0/wangzhaoyang/detr/detrex/tools/train_net.py", line 258, in hooks.EvalHook(cfg.train.eval_period, lambda: do_test(cfg, model)), File "/data0/wangzhaoyang/detr/detrex/tools/train_net.py", line 167, in do_test ret = inference_on_dataset( File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/evaluation/evaluator.py", line 204, in inference_ondataset results = evaluator.evaluate() File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/evaluation/coco_evaluation.py", line 206, in evaluae self._eval_predictions(predictions, img_ids=img_ids) File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/evaluation/coco_evaluation.py", line 240, in _eval_redictions assert category_id < num_classes, ( AssertionError: A prediction has class=6, but the dataset only has 5 classes and predicted class id should be in 0, 4]. [02/26 23:02:42 d2.engine.hooks]: Overall training speed: 13997 iterations in 2:46:46 (0.7149 s / it) [02/26 23:02:42 d2.engine.hooks]: Total training time: 2:52:35 (0:05:49 on hooks) [02/26 23:02:42 d2.utils.events]: eta: 20:58:55 iter: 13999 total_loss: 11.72 loss_class: 0.1678 loss_bbox: .04619 loss_giou: 0.8291 loss_class_0: 0.2467 loss_bbox_0: 0.04601 loss_giou_0: 0.8328 loss_class_1: 0.201 oss_bbox_1: 0.04803 loss_giou_1: 0.8129 loss_class_2: 0.1722 loss_bbox_2: 0.04691 loss_giou_2: 0.818 loss_clss_3: 0.1798 loss_bbox_3: 0.0458 loss_giou_3: 0.8111 loss_class_4: 0.1709 loss_bbox_4: 0.04619 loss_giou_4: .8276 loss_class_enc: 0.2543 loss_bbox_enc: 0.05304 loss_giou_enc: 0.9121 loss_class_dn: 0.01214 loss_bbox_d: 0.02816 loss_giou_dn: 0.6193 loss_class_dn_0: 0.04826 loss_bbox_dn_0: 0.0378 loss_giou_dn_0: 0.7667 loss_cass_dn_1: 0.01807 loss_bbox_dn_1: 0.02934 loss_giou_dn_1: 0.6113 loss_class_dn_2: 0.01376 loss_bbox_dn_2: 0.0801 loss_giou_dn_2: 0.607 loss_class_dn_3: 0.01261 loss_bbox_dn_3: 0.02806 loss_giou_dn_3: 0.6095 loss_classdn_4: 0.01219 loss_bbox_dn_4: 0.02811 loss_giou_dn_4: 0.6143 time: 0.7148 data_time: 0.0090 lr: 0.0001 max_em: 24711M wandb: Waiting for W&B process to finish... (success). wandb: Network error (ConnectTimeout), entering retry loop. wandb: wandb: Run history: wandb: bbox/AP ▁▃▆▇▇█ wandb: bbox/AP-bengbian ▁▃▆▆▆█ wandb: bbox/AP-duanshan ▁▂▆██▇ wandb: bbox/AP-loujiang ▁▄▇▇▆█ wandb: bbox/AP-yinxu ▁▁▁▅█▃ wandb: bbox/AP-zangpian ▁▄▅▇▇█ wandb: bbox/AP50 ▁▄▆▇▇█ wandb: bbox/AP75 ▁▃▆█▆█ wandb: bbox/APl ▁▄▆▆▆█ wandb: bbox/APm ▁▄▆▆▆█ wandb: bbox/APs ▁▃▆▇██ wandb: data_time ▂▆▂▄▃▆▄▅▃▅▅▅▁▄▃▄▇▅▄█▇▆▆▃▆▅▅▆▄▄▅▅▅▄▅▄▆█▆▄ wandb: eta_seconds █▇███▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▃▃▂▂▁▁▁▁▂▁ wandb: loss_bbox ██▆▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_bbox_0 █▇▆▃▂▃▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_bbox_1 █▇▆▃▃▃▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_bbox_2 ██▆▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_bbox_3 ██▆▄▃▃▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_bbox_4 ██▆▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_bbox_dn ▆▇█▇▆▆▄▅▄▄▅▄▃▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▁▁▁ wandb: loss_bbox_dn_0 ▆▇█▇▇▆▄▅▄▅▅▃▃▄▃▃▃▃▂▂▃▂▂▂▂▃▂▂▂▂▁▂▂▁▁▂▁▁▁▁ wandb: loss_bbox_dn_1 ▆▇█▇▆▆▄▅▄▄▅▄▃▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▁▁▁ wandb: loss_bbox_dn_2 ▆▇█▇▆▆▄▅▄▄▅▄▃▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▁▁▁ wandb: loss_bbox_dn_3 ▆▇█▇▆▆▄▅▄▄▅▄▃▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▁▁▁ wandb: loss_bbox_dn_4 ▆▇█▇▆▆▄▅▄▄▅▄▃▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▁▁▁ wandb: loss_bbox_enc █▇▅▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_class ███▆▅▅▅▄▅▄▄▃▃▃▃▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁ wandb: loss_class_0 ███▆▅▅▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁ wandb: loss_class_1 ███▆▅▅▅▄▄▄▄▃▃▃▃▃▂▃▃▂▂▃▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁ wandb: loss_class_2 ███▆▅▅▅▅▅▄▄▃▃▃▃▃▂▂▃▂▂▃▂▂▂▂▁▂▂▂▂▁▂▂▁▁▁▁▁▁ wandb: loss_class_3 ███▆▅▅▅▅▅▄▄▃▃▃▃▃▂▂▃▂▂▃▂▂▂▂▁▂▂▂▂▁▂▂▁▁▁▁▁▁ wandb: loss_class_4 ███▆▅▅▅▅▅▄▄▃▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▁▁▁▁ wandb: loss_class_dn █▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_class_dn_0 █▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_class_dn_1 █▅▅▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_class_dn_2 █▅▅▅▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_class_dn_3 █▅▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_class_dn_4 █▅▅▄▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_class_enc ███▆▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁ wandb: loss_giou ██▇▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_giou_0 ██▇▅▅▅▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_giou_1 ██▇▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_giou_2 ██▇▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_giou_3 ██▇▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_giou_4 ██▇▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ wandb: loss_giou_dn ████▇▇▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁ wandb: loss_giou_dn_0 ████▇▇▆▆▅▅▅▅▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▁▂▁▁▂▂▁▁▁▁▁▁ wandb: loss_giou_dn_1 ████▇▇▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▂▂▁▁▁▁▁▁ wandb: loss_giou_dn_2 ████▇▇▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▂▁▁▁▂▁▁▁▁▁▁ wandb: loss_giou_dn_3 ████▇▇▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▂▁▁▁▂▁▁▁▁▁▁ wandb: loss_giou_dn_4 ████▇▇▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁ wandb: loss_giou_enc ██▇▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ wandb: lr ▁▃▆█████████████████████████████████████ wandb: time ▂▅█▇▃▁▄▄▂▅▅▄▅▄▅▅▂▇▄▅▅▇▄▇▄▃▇▂▇▂▇▅▅▆▄▂▄▆▆▄ wandb: total_loss ██▇▆▅▅▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁ wandb: wandb: Run summary: wandb: bbox/AP 13.45007 wandb: bbox/AP-bengbian 12.02601 wandb: bbox/AP-duanshan 8.42912 wandb: bbox/AP-loujiang 19.63638 wandb: bbox/AP-yinxu 0.85431 wandb: bbox/AP-zangpian 26.30452 wandb: bbox/AP50 43.96051 wandb: bbox/AP75 3.76077 wandb: bbox/APl 44.22862 wandb: bbox/APm 18.40223 wandb: bbox/APs 10.8132 wandb: data_time 0.00908 wandb: eta_seconds 75535.43819 wandb: loss_bbox 0.04619 wandb: loss_bbox_0 0.04601 wandb: loss_bbox_1 0.04803 wandb: loss_bbox_2 0.04691 wandb: loss_bbox_3 0.0458 wandb: loss_bbox_4 0.04619 wandb: loss_bbox_dn 0.02816 wandb: loss_bbox_dn_0 0.0378 wandb: loss_bbox_dn_1 0.02934 wandb: loss_bbox_dn_2 0.02801 wandb: loss_bbox_dn_3 0.02806 wandb: loss_bbox_dn_4 0.02811 wandb: loss_bbox_enc 0.05304 wandb: loss_class 0.1678 wandb: loss_class_0 0.24672 wandb: loss_class_1 0.20097 wandb: loss_class_2 0.17215 wandb: loss_class_3 0.17976 wandb: loss_class_4 0.17093 wandb: loss_class_dn 0.01214 wandb: loss_class_dn_0 0.04826 wandb: loss_class_dn_1 0.01807 wandb: loss_class_dn_2 0.01376 wandb: loss_class_dn_3 0.01261 wandb: loss_class_dn_4 0.01219 wandb: loss_class_enc 0.25431 wandb: loss_giou 0.82908 wandb: loss_giou_0 0.8328 wandb: loss_giou_1 0.81291 wandb: loss_giou_2 0.81798 wandb: loss_giou_3 0.81106 wandb: loss_giou_4 0.82758 wandb: loss_giou_dn 0.61925 wandb: loss_giou_dn_0 0.76666 wandb: loss_giou_dn_1 0.61127 wandb: loss_giou_dn_2 0.60695 wandb: loss_giou_dn_3 0.60949 wandb: loss_giou_dn_4 0.61434 wandb: loss_giou_enc 0.91206 wandb: lr 0.0001 wandb: time 0.69452 wandb: total_loss 11.71719 wandb: wandb: Synced detrex_experiment1: https://wandb.ai/wangzhaoyang/detrex/runs/3evjnbxb wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb_output/wandb/run-20230226_200920-3evjnbxb/logs Traceback (most recent call last): File "tools/train_net.py", line 307, in launch( File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/launch.py", line 67, in launch mp.spawn( File "/home/amax/anaconda3/envs/wangzydetrex/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 30, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/amax/anaconda3/envs/wangzydetrex/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 88, in start_processes while not context.join(): File "/home/amax/anaconda3/envs/wangzydetrex/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 50, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/home/amax/anaconda3/envs/wangzydetrex/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 9, in _wrap fn(i, args) File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/launch.py", line 126, in _distributed_worker main_func(args) File "/data0/wangzhaoyang/detr/detrex/tools/train_net.py", line 302, in main do_train(args, cfg) File "/data0/wangzhaoyang/detr/detrex/tools/train_net.py", line 275, in do_train trainer.train(start_iter, cfg.train.max_iter) File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/train_loop.py", line 150, in train self.after_step() File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/train_loop.py", line 180, in after_step h.after_step() File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/hooks.py", line 555, in after_step self._do_eval() File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/engine/hooks.py", line 528, in _do_eval results = self._func() File "/data0/wangzhaoyang/detr/detrex/tools/train_net.py", line 258, in hooks.EvalHook(cfg.train.eval_period, lambda: do_test(cfg, model)), File "/data0/wangzhaoyang/detr/detrex/tools/train_net.py", line 167, in do_test ret = inference_on_dataset( File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/evaluation/evaluator.py", line 204, in inference_ondataset results = evaluator.evaluate() File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/evaluation/coco_evaluation.py", line 206, in evaluae self._eval_predictions(predictions, img_ids=img_ids) File "/data0/wangzhaoyang/detr/detrex/detectron2/detectron2/evaluation/coco_evaluation.py", line 240, in _eval_redictions assert category_id < num_classes, ( AssertionError: A prediction has class=6, but the dataset only has 5 classes and predicted class id should be in 0, 4].

rentainhe commented 1 year ago

Would you like to provide your training configs for us, we need the model and training config, and could you tell us the loss function you used?

wangzhaoyang-508 commented 1 year ago

Would you like to provide your training configs for us, we need the model and training config, and could you tell us the loss function you used? I check the “ /data0/wangzhaoyang/detr/detrex/projects/dino/configs/models/dino_50.py” and find the models num_classes may wrong，it can works well now。

By the way, if i train from scratch, should i change the “bias=False” to “bias=True” in “detrex\detrex\modeling\backbone\resnet.py” ？ Since isee the log about the model was e.g. “128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False”

wangzhaoyang-508 commented 1 year ago

Would you like to provide your training configs for us, we need the model and training config, and could you tell us the loss function you used?

I check the “ /data0/wangzhaoyang/detr/detrex/projects/dino/configs/models/dino_50.py” and find the models num_classes may wrong，it can works well now。

By the way, if i train from scratch, should i change the “bias=False” to “bias=True” in “detrex\detrex\modeling\backbone\resnet.py” ？ Since isee the log about the model was e.g. “128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False”

rentainhe commented 1 year ago

Would you like to provide your training configs for us, we need the model and training config, and could you tell us the loss function you used?

I check the “ /data0/wangzhaoyang/detr/detrex/projects/dino/configs/models/dino_50.py” and find the models num_classes may wrong，it can works well now。

By the way, if i train from scratch, should i change the “bias=False” to “bias=True” in “detrex\detrex\modeling\backbone\resnet.py” ？ Since isee the log about the model was e.g. “128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False”

training from scratch means you're training with random initialized backbone or imagenet pretrained backbone~

I suggest not to update the backbone configuration here

rentainhe commented 1 year ago

Would you like to provide your training configs for us, we need the model and training config, and could you tell us the loss function you used? I check the “ /data0/wangzhaoyang/detr/detrex/projects/dino/configs/models/dino_50.py” and find the models num_classes may wrong，it can works well now。

By the way, if i train from scratch, should i change the “bias=False” to “bias=True” in “detrex\detrex\modeling\backbone\resnet.py” ？ Since isee the log about the model was e.g. “128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False”

BTW, we actually used the detectron2 resnet in our config, we did not use the resnet that was re-implemented in detrex.

The reason why we re-implement resnet model in detrex is that in detectron2 original implementation, it's not easy to set dilation=2 in the last-stage (hard to build ResNet-DC5 model)

wangzhaoyang-508 commented 1 year ago

Thank you so much， since our custom dataset is totally different from imagenet or coco，we want train from scratch to see if it can get better performance。 so ，how should i modify the config to make the backbone update in training？

Did the “128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False”in log means the backbone are not update？

rentainhe commented 1 year ago

Thank you so much， since our custom dataset is totally different from imagenet or coco，we want train from scratch to see if it can get better performance。 so ，how should i modify the config to make the backbone update in training？

Did the “128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False”in log means the backbone are not update？

Yes, I think it's not update~ maybe you can update your config like this to see if it works:

model = L(DINO)(
    backbone=L(ResNet)(
        stem=L(BasicStem)(in_channels=3, out_channels=64, norm="FrozenBN"),
        stages=L(ResNet.make_default_stages)(
            depth=50,
            stride_in_1x1=False,
            norm="FrozenBN",
            bias=True  # add this one
        ),
        out_features=["res3", "res4", "res5"],
        freeze_at=1,
    ),

wangzhaoyang-508 commented 1 year ago

Thank you so much, your answer in really helpful。

But modify the config by adding “bias=True” can not work，since the upper wrapper do not have the arg “ bias” 。 I think change the “bias=False” to “bias=True” in “detrex\detrex\modeling\backbone\resnet.py” or “detectron2\modeling\backbone\resnet.py” may the only way to changge。

By the way ,I learn that it is useless to change “bias=True” before BN, since the BN will remove the bias effect。 But，if I use GN or LN ，“bias=True” maybe useful， doesn't it？

rentainhe commented 1 year ago

Thank you so much, your answer in really helpful。

But modify the config by adding “bias=True” can not work，since the upper wrapper do not have the arg “ bias” 。 I think change the “bias=False” to “bias=True” in “detrex\detrex\modeling\backbone\resnet.py” or “detectron2\modeling\backbone\resnet.py” may the only way to changge。

By the way ,I learn that it is useless to change “bias=True” before BN, since the BN will remove the bias effect。 But，if I use GN or LN ，“bias=True” maybe useful， doesn't it？

yes, it may be useful, it's better to do some experiments on these modifications~

wangzhaoyang-508 commented 1 year ago

Excuse me，If I only want to detect a subset of categories（such like maybe 10 classes in COCO），how should I modify the config？

rentainhe commented 1 year ago

Excuse me，If I only want to detect a subset of categories（such like maybe 10 classes in COCO），how should I modify the config？

just modify the num_classes is OK~

wangzhaoyang-508 commented 1 year ago

Excuse me，If I only want to detect a subset of categories（such like maybe 10 classes in COCO），how should I modify the config？

just modify the num_classes is OK~

Then，how can I choose the class name or classes id？ ( the names or ids of the classes which I only interest) Since I only want to detect a few classes in my custom dataset。

rentainhe commented 1 year ago

Excuse me，If I only want to detect a subset of categories（such like maybe 10 classes in COCO），how should I modify the config？

just modify the num_classes is OK~

Then，how can I choose the class name or classes id？ ( the names or ids of the classes which I only interest) Since I only want to detect a few classes in my custom dataset。

I think maybe you should firstly convert your dataset into coco format, then d2 will help you to handle the other things~

rentainhe commented 1 year ago

Excuse me，If I only want to detect a subset of categories（such like maybe 10 classes in COCO），how should I modify the config？

just modify the num_classes is OK~

Then，how can I choose the class name or classes id？ ( the names or ids of the classes which I only interest) Since I only want to detect a few classes in my custom dataset。

I think maybe you should firstly convert your dataset into coco format, then d2 will help you to handle the other things~

You can refer to this config, just register you own dataset in two lines: https://github.com/IDEA-Research/detrex/blob/main/configs/common/data/custom.py

wangzhaoyang-508 commented 1 year ago

Thank you so much your reply is really quick^_^

I successfully registered a custom dataset with 8 classes of objects. And it can be trained well.

Now, 4 of the 8 classes in my custom dataset is no need to be detected anymore。

I don't want to change the json file, so how do I re-register， so that the model only focuses on the four classes of useful objects when it trained?

in detrex I tried to use the “MetadataCatalog.get” but it can not help the error is AssertionError: Attribute 'thing_classes' in the metadata of 'my_eldataset_train' cannot be set to a different value! ['duanshan', 'heiban', 'xianzhuangquexian', 'xuhan'] != ['duanshan', 'heiban', 'xianzhuangquexian', 'xuhan', 'yinlie', 'crush', 'finger', 'star']

custom.py codes import itertools from omegaconf import OmegaConf import detectron2.data.transforms as T from detectron2.config import LazyCall as L from detectron2.data import ( build_detection_test_loader, build_detection_train_loader, get_detection_dataset_dicts, MetadataCatalog, ) from detectron2.data.datasets import register_coco_instances from detectron2.evaluation import COCOEvaluator

from detrex.data import DetrDatasetMapper

dataloader = OmegaConf.create()

register_coco_instances("my_eldataset_train", {}, '/data1/wzydatasets/yuanle/coco/annotations/instances_train2017.json', '/data1/wzydatasets/yuanle/coco/train2017/') register_coco_instances("my_eldataset_test", {}, '/data1/wzydatasets/yuanle/coco/annotations/instances_test2017.json', '/data1/wzydatasets/yuanle/coco/test2017/')

MetadataCatalog.get("my_eldataset_train").thing_classes = ['duanshan', 'heiban', 'xianzhuangquexian', 'xuhan'] MetadataCatalog.get("my_eldataset_test").thing_classes = ['duanshan', 'heiban', 'xianzhuangquexian', 'xuhan']

dataloader.train = L(build_detection_train_loader)( dataset=L(get_detection_dataset_dicts)(names="my_eldataset_train"), mapper=L(DetrDatasetMapper)( augmentation=[ L(T.ResizeShortestEdge)( short_edge_length=600, max_size=600, ), L(T.RandomFlip)(), L(T.ResizeShortestEdge)( short_edge_length=(320, 480, 512, 544, 576, 608,),

max_size=1333,

max_size=640, sample_style="choice", ), ], augmentation_with_crop=[ L(T.RandomFlip)(), L(T.ResizeShortestEdge)( short_edge_length=600, max_size=600, ), L(T.RandomCrop)( crop_type="absolute_range", crop_size=(300, 400), ), L(T.ResizeShortestEdge)( short_edge_length=(320, 480, 512, 544, 576, 608,), max_size=640, sample_style="choice", ), ], is_train=True, mask_on=False, img_format="RGB", ), total_batch_size=16, num_workers=4, )

dataloader.test = L(build_detection_test_loader)( dataset=L(get_detection_dataset_dicts)(names="my_eldataset_test", filter_empty=False), mapper=L(DetrDatasetMapper)( augmentation=[ L(T.ResizeShortestEdge)( short_edge_length=600, max_size=640, ), ], augmentation_with_crop=None, is_train=False, mask_on=False, img_format="RGB", ), num_workers=4, )

dataloader.evaluator = L(COCOEvaluator)( dataset_name="${..test.dataset.names}", )

wangzhaoyang-508 commented 1 year ago

MetadataCatalog.get("my_eldataset_train").thing_classes = ['duanshan', 'heiban', 'xianzhuangquexian', 'xuhan'] MetadataCatalog.get("my_eldataset_test").thing_classes = ['duanshan', 'heiban', 'xianzhuangquexian', 'xuhan']

I just add these two lines of code，since those are the only 4 I'm interested in now。but it can not work。

rentainhe commented 1 year ago

We will check this issue later~

rentainhe commented 1 year ago

MetadataCatalog.get("my_eldataset_train").thing_classes = ['duanshan', 'heiban', 'xianzhuangquexian', 'xuhan'] MetadataCatalog.get("my_eldataset_test").thing_classes = ['duanshan', 'heiban', 'xianzhuangquexian', 'xuhan']

I just add these two lines of code，since those are the only 4 I'm interested in now。but it can not work。

Seems like it's not suitable to directly change the attribute from the get function, it's a python syntax error.

rentainhe commented 1 year ago

Did you solve the problem now~ @wangzhaoyang-508

Kim-yhao commented 1 year ago

Where is this parameter modified?

IDEA-Research / detrex

Where can i check the prediction classes #217

max_size=1333,