CanPeng123 / Faster-ILOD

45 stars 7 forks source link

How to reproduce the results of the coco dataset #7

Open chenfangchenf opened 2 years ago

chenfangchenf commented 2 years ago

How to reproduce the results of the coco dataset When I first trained 70 classes,the“e2e_faster_rcnn_R_50_C4_1x.yaml”: `MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"

BACKBONE: CONV_BODY: "R-50-C4" RESNETS: BACKBONE_OUT_CHANNELS: 1024

RPN: USE_FPN: False ANCHOR_STRIDE: (16,) PRE_NMS_TOP_N_TRAIN: 12000
PRE_NMS_TOP_N_TEST: 6000 POST_NMS_TOP_N_TRAIN: 2000 POST_NMS_TOP_N_TEST: 1000

ROI_HEADS: USE_FPN: False #是否使用FPN

ROI_BOX_HEAD: POOLER_RESOLUTION: 7
POOLER_SCALES: (0.0625,)
POOLER_SAMPLING_RATIO: 2 FEATURE_EXTRACTOR: "ResNet50Conv5ROIFeatureExtractor" PREDICTOR: "FastRCNNPredictor"
NUM_CLASSES: 71 # total classes DATASETS: TRAIN: ("coco_2014_train", "coco_2014_valminusminival") # 80k + 35k TEST: ("coco_2014_minival",) # 5k DATALOADER: SIZE_DIVISIBILITY: 0 SOLVER: BASE_LR: 0.001 # start learning rate WEIGHT_DECAY: 0.0001 GAMMA: 0.1 # learning rate decay STEPS: (40000 ,) MAX_ITER: 80000 # number of iteration CHECKPOINT_PERIOD: 2500 # number of iteration to generate check point IMS_PER_BATCH: 1 # number of images per batch MOMENTUM: 0.9 TEST: # testing strategy IMS_PER_BATCH: 1 # number of images per batch OUTPUT_DIR: "/home/maskrcnn-benchmark/incremental_learning_ResNet50_C4/coco/first_70_1" # path to store the result TENSORBOARD_DIR: "/home/maskrcnn-benchmark/incremental_learning_ResNet50_C4/coco/first_70_1/tensorboard" # path to store tensorboard info ` The following error occurs

number of images used for training: 9485 2022-06-06 09:38:46,145 maskrcnn_benchmark.trainer INFO: Start training /home/maskrcnn-benchmark/maskrcnn_benchmark/structures/segmentation_mask.py:422: UserWarning: This overload of nonzero is deprecated: nonzero() Consider using one of the following signatures instead: nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.) item = item.nonzero() /home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call oflr_scheduler.step()beforeoptimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()beforelr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertiont >= 0 && t < n_classesfailed. Traceback (most recent call last): File "/home/maskrcnn-benchmark/tools/train_first_step.py", line 241, in <module> main() File "/home/maskrcnn-benchmark/tools/train_first_step.py", line 232, in main model = train(cfg, args.local_rank, args.distributed) File "/home/maskrcnn-benchmark/tools/train_first_step.py", line 107, in train arguments, File "/home/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 70, in do_train loss_dict = model(images, targets) File "/home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd **applier(kwargs, input_caster)) File "/home/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 73, in forward x, result, detector_losses = self.roi_heads(features, proposals, targets) File "/home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 29, in forward x, detections, loss_box = self.box(features, proposals, targets) File "/home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 65, in forward loss_classifier, loss_box_reg = self.loss_evaluator([class_logits], [box_regression]) File "/home/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py", line 173, in __call__ sampled_pos_inds_subset = torch.nonzero(labels > 0).squeeze(1) RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered What is the reason for this error and how can I go about setting the number of NUM_CLASSES、NAME_OLD_CLASSES、NAME_NEW_CLASSES、NAME_EXCLUDED_CLASSES. I would be grateful if you could help me . @CanPeng123 @KunUQ

zhongjian1999 commented 1 year ago

Hi, have you reproduce the results of the coco dataset successfully?

How to reproduce the results of the coco dataset When I first trained 70 classes,the“e2e_faster_rcnn_R_50_C4_1x.yaml”: `MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50"

BACKBONE: CONV_BODY: "R-50-C4" RESNETS: BACKBONE_OUT_CHANNELS: 1024

RPN: USE_FPN: False ANCHOR_STRIDE: (16,) PRE_NMS_TOP_N_TRAIN: 12000 PRE_NMS_TOP_N_TEST: 6000 POST_NMS_TOP_N_TRAIN: 2000 POST_NMS_TOP_N_TEST: 1000

ROI_HEADS: USE_FPN: False #是否使用FPN

ROI_BOX_HEAD: POOLER_RESOLUTION: 7 POOLER_SCALES: (0.0625,) POOLER_SAMPLING_RATIO: 2 FEATURE_EXTRACTOR: "ResNet50Conv5ROIFeatureExtractor" PREDICTOR: "FastRCNNPredictor" NUM_CLASSES: 71 # total classes DATASETS: TRAIN: ("coco_2014_train", "coco_2014_valminusminival") # 80k + 35k TEST: ("coco_2014_minival",) # 5k DATALOADER: SIZE_DIVISIBILITY: 0 SOLVER: BASE_LR: 0.001 # start learning rate WEIGHT_DECAY: 0.0001 GAMMA: 0.1 # learning rate decay STEPS: (40000 ,) MAX_ITER: 80000 # number of iteration CHECKPOINT_PERIOD: 2500 # number of iteration to generate check point IMS_PER_BATCH: 1 # number of images per batch MOMENTUM: 0.9 TEST: # testing strategy IMS_PER_BATCH: 1 # number of images per batch OUTPUT_DIR: "/home/maskrcnn-benchmark/incremental_learning_ResNet50_C4/coco/first_70_1" # path to store the result TENSORBOARD_DIR: "/home/maskrcnn-benchmark/incremental_learning_ResNet50_C4/coco/first_70_1/tensorboard" # path to store tensorboard info ` The following error occurs

number of images used for training: 9485 2022-06-06 09:38:46,145 maskrcnn_benchmark.trainer INFO: Start training /home/maskrcnn-benchmark/maskrcnn_benchmark/structures/segmentation_mask.py:422: UserWarning: This overload of nonzero is deprecated: nonzero() Consider using one of the following signatures instead: nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.) item = item.nonzero() /home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call oflr_scheduler.step()beforeoptimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()beforelr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertiont >= 0 && t < n_classesfailed. Traceback (most recent call last): File "/home/maskrcnn-benchmark/tools/train_first_step.py", line 241, in <module> main() File "/home/maskrcnn-benchmark/tools/train_first_step.py", line 232, in main model = train(cfg, args.local_rank, args.distributed) File "/home/maskrcnn-benchmark/tools/train_first_step.py", line 107, in train arguments, File "/home/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 70, in do_train loss_dict = model(images, targets) File "/home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd **applier(kwargs, input_caster)) File "/home/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 73, in forward x, result, detector_losses = self.roi_heads(features, proposals, targets) File "/home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 29, in forward x, detections, loss_box = self.box(features, proposals, targets) File "/home/anaconda3/envs/maskrcnn_copy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 65, in forward loss_classifier, loss_box_reg = self.loss_evaluator([class_logits], [box_regression]) File "/home/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py", line 173, in __call__ sampled_pos_inds_subset = torch.nonzero(labels > 0).squeeze(1) RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered What is the reason for this error and how can I go about setting the number of NUM_CLASSES、NAME_OLD_CLASSES、NAME_NEW_CLASSES、NAME_EXCLUDED_CLASSES. I would be grateful if you could help me . @CanPeng123 @KunUQ

Hi, have you reproduced the results of the coco dataset successfully?