Closed ghost closed 4 years ago
If you need help with an unexpected issue, please include details following the issue template.
git diff
) or what code you wrote
Running traditional plain_train_net.py. I have only changed --gpus-num = 2, NUM_WORKERS = 2, batch_size to 7, learning rate = 0.02, cfg.TEST.EVAL_PERIOD = 4510, and MAX_ITER = 100000.
Here are the config argumets:
cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("Misc/cascade_mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("idd_train",)
#cfg.DATASETS.TEST = ()
cfg.DATASETS.TEST = ("idd_val",)
cfg.MODEL.WEIGHTS = "/idd_data_coco/model_final_480dd8.pkl" # Cascade
cfg.OUTPUT_DIR = '/idd_data_coco/models/'
cfg.MODEL.MASK_ON = False
cfg.DATALOADER.NUM_WORKERS = 2
cfg.SOLVER.MAX_ITER = 1900
cfg.SOLVER.CHECKPOINT_PERIOD = 120000
cfg.SOLVER.BASE_LR = 0.02 # pick a good LR
cfg.SOLVER.GAMMA = 0.3
cfg.SOLVER.STEPS = (15000, 20000,)
cfg.TEST.EVAL_PERIOD = 4510
cfg.SOLVER.IMS_PER_BATCH = 7
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # default: 512
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 15
2. what exact command you run: python plain_train_net.py
3. what you observed (including the full logs):
[03/10 10:23:24] detectron2 INFO: Rank of current process: 0. World size: 2 [03/10 10:23:25] detectron2 INFO: Environment info:
sys.platform linux
Python 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
numpy 1.18.1
detectron2 0.1.1 @/home/username/detectron2_v1/detectron2
detectron2 compiler GCC 5.5
detectron2 CUDA compiler 10.2
detectron2 arch flags sm_75
DETECTRON2_ENV_MODULE
PyTorch built with:
[03/10 10:23:25] detectron2 INFO: Command line arguments: Namespace(config_file='', dist_url='tcp://127.0.0.1:51012', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False) [03/10 10:23:25] detectron2 INFO: Running with full config: CUDNN_BENCHMARK: False DATALOADER: ASPECT_RATIO_GROUPING: True FILTER_EMPTY_ANNOTATIONS: True NUM_WORKERS: 2 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: () PROPOSAL_FILES_TRAIN: () TEST: ('idd_val',) TRAIN: ('idd_train',) GLOBAL: HACK: 1.0 INPUT: CROP: ENABLED: False SIZE: [0.9, 0.9] TYPE: relative_range FORMAT: BGR MASK_FORMAT: polygon MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) MIN_SIZE_TRAIN_SAMPLING: choice MODEL: ANCHOR_GENERATOR: ANGLES: [[-90, 0, 90]] ASPECT_RATIOS: [[0.5, 1.0, 2.0]] NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES: [[32], [64], [128], [256], [512]] BACKBONE: FREEZE_AT: 2 NAME: build_resnet_fpn_backbone DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES: ['res2', 'res3', 'res4', 'res5'] NORM: OUT_CHANNELS: 256 KEYPOINT_ON: False LOAD_PROPOSALS: False MASK_ON: False META_ARCHITECTURE: GeneralizedRCNN PANOPTIC_FPN: COMBINE: ENABLED: True INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN: [103.53, 116.28, 123.675] PIXEL_STD: [1.0, 1.0, 1.0] PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: False DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE: [False, False, False, False] DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES: ['res2', 'res3', 'res4', 'res5'] RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: True WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0) FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7'] IOU_LABELS: [0, -1, 1] IOU_THRESHOLDS: [0.4, 0.5] NMS_THRESH_TEST: 0.5 NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0)) IOUS: (0.5, 0.6, 0.7) ROI_BOX_HEAD: BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0) CLS_AGNOSTIC_BBOX_REG: True CONV_DIM: 256 FC_DIM: 1024 NAME: FastRCNNConvFCHead NORM: NUM_CONV: 0 NUM_FC: 2 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: False ROI_HEADS: BATCH_SIZE_PER_IMAGE: 128 IN_FEATURES: ['p2', 'p3', 'p4', 'p5'] IOU_LABELS: [0, 1] IOU_THRESHOLDS: [0.5] NAME: CascadeROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 15 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: True SCORE_THRESH_TEST: 0.5 ROI_KEYPOINT_HEAD: CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512) LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: False CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: NUM_CONV: 4 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0) BOUNDARY_THRESH: -1 HEAD_NAME: StandardRPNHead IN_FEATURES: ['p2', 'p3', 'p4', 'p5', 'p6'] IOU_LABELS: [0, -1, 1] IOU_THRESHOLDS: [0.3, 0.7] LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 2000 PRE_NMS_TOPK_TEST: 1000 PRE_NMS_TOPK_TRAIN: 2000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES: ['p2', 'p3', 'p4', 'p5'] LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 WEIGHTS: /ssd_scratch/cvit/username/idd_data_coco/model_final_480dd8.pkl OUTPUT_DIR: /ssd_scratch/cvit/username/idd_data_coco/models/ SEED: -1 SOLVER: BASE_LR: 0.02 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 500000 GAMMA: 0.3 IMS_PER_BATCH: 14 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 120000 MOMENTUM: 0.9 STEPS: (80000, 100000) WARMUP_FACTOR: 0.001 WARMUP_ITERS: 1000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0.0001 WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: False FLIP: True MAX_SIZE: 4000 MIN_SIZES: (400, 500, 600, 700, 800, 900, 1000, 1100, 1200) DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 4510 EXPECTED_RESULTS: [] KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: False NUM_ITER: 200 VERSION: 2 VIS_PERIOD: 0 [03/10 10:23:25] detectron2 INFO: Full config saved to /ssd_scratch/cvit/username/idd_data_coco/models/config.yaml [03/10 10:23:25] d2.utils.env INFO: Using a generated random seed 25458272 [03/10 10:23:26] detectron2 INFO: Model: GeneralizedRCNN( (backbone): FPN( (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (top_block): LastLevelMaxPool() (bottom_up): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) ) (res2): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv1): Conv2d( 64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) ) (res3): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv1): Conv2d( 256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) ) (res4): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) (conv1): Conv2d( 512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (4): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (5): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) ) (res5): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) (conv1): Conv2d( 1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) ) ) ) (proposal_generator): RPN( (anchor_generator): DefaultAnchorGenerator( (cell_anchors): BufferList() ) (rpn_head): StandardRPNHead( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1)) (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1)) ) ) (roi_heads): CascadeROIHeads( (box_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True) (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True) (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True) (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True) ) ) (box_head): ModuleList( (0): FastRCNNConvFCHead( (fc1): Linear(in_features=12544, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=1024, bias=True) ) (1): FastRCNNConvFCHead( (fc1): Linear(in_features=12544, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=1024, bias=True) ) (2): FastRCNNConvFCHead( (fc1): Linear(in_features=12544, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=1024, bias=True) ) ) (box_predictor): ModuleList( (0): FastRCNNOutputLayers( (cls_score): Linear(in_features=1024, out_features=16, bias=True) (bbox_pred): Linear(in_features=1024, out_features=4, bias=True) ) (1): FastRCNNOutputLayers( (cls_score): Linear(in_features=1024, out_features=16, bias=True) (bbox_pred): Linear(in_features=1024, out_features=4, bias=True) ) (2): FastRCNNOutputLayers( (cls_score): Linear(in_features=1024, out_features=16, bias=True) (bbox_pred): Linear(in_features=1024, out_features=4, bias=True) ) ) ) ) [03/10 10:23:26] fvcore.common.checkpoint INFO: Loading checkpoint from /ssd_scratch/cvit/username/idd_data_coco/model_final_480dd8.pkl [03/10 10:23:27] fvcore.common.checkpoint INFO: Reading a file from 'Detectron2 Model Zoo' [03/10 10:23:27] fvcore.common.checkpoint WARNING: 'roi_heads.box_predictor.0.cls_score.weight' has shape (81, 1024) in the checkpoint but (16, 1024) in the model! Skipped. [03/10 10:23:27] fvcore.common.checkpoint WARNING: 'roi_heads.box_predictor.0.cls_score.bias' has shape (81,) in the checkpoint but (16,) in the model! Skipped. [03/10 10:23:27] fvcore.common.checkpoint WARNING: 'roi_heads.box_predictor.1.cls_score.weight' has shape (81, 1024) in the checkpoint but (16, 1024) in the model! Skipped. [03/10 10:23:27] fvcore.common.checkpoint WARNING: 'roi_heads.box_predictor.1.cls_score.bias' has shape (81,) in the checkpoint but (16,) in the model! Skipped. [03/10 10:23:27] fvcore.common.checkpoint WARNING: 'roi_heads.box_predictor.2.cls_score.weight' has shape (81, 1024) in the checkpoint but (16, 1024) in the model! Skipped. [03/10 10:23:27] fvcore.common.checkpoint WARNING: 'roi_heads.box_predictor.2.cls_score.bias' has shape (81,) in the checkpoint but (16,) in the model! Skipped. [03/10 10:23:27] fvcore.common.checkpoint INFO: Some model parameters are not in the checkpoint: [34mroi_heads.box_predictor.0.cls_score.{weight, bias}[0m [34mroi_heads.box_predictor.1.cls_score.{weight, bias}[0m [34mroi_heads.box_predictor.2.cls_score.{weight, bias}[0m [03/10 10:23:27] fvcore.common.checkpoint INFO: The checkpoint contains parameters not used by the model: [35mroi_heads.mask_head.mask_fcn1.{weight, bias}[0m [35mroi_heads.mask_head.mask_fcn2.{weight, bias}[0m [35mroi_heads.mask_head.mask_fcn3.{weight, bias}[0m [35mroi_heads.mask_head.mask_fcn4.{weight, bias}[0m [35mroi_heads.mask_head.deconv.{weight, bias}[0m [35mroi_heads.mask_head.predictor.{weight, bias}[0m [03/10 10:23:30] d2.data.datasets.coco INFO: Loading /ssd_scratch/cvit/username/idd_data_coco/idd_train_annotation.json takes 2.07 seconds. [03/10 10:23:30] d2.data.datasets.coco WARNING: Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.
[03/10 10:23:30] d2.data.datasets.coco INFO: Loaded 31569 images in COCO format from /ssd_scratch/cvit/username/idd_data_coco/idd_train_annotation.json [03/10 10:23:31] d2.data.build INFO: Removed 0 images with no usable annotations. 31569 images left. [03/10 10:23:32] d2.data.build INFO: Distribution of instances among all 15 categories: [36m | category | #instances | category | #instances | category | #instances |
---|---|---|---|---|---|---|
car | 65676 | bus | 13829 | autorickshaw | 24498 | |
vehicle fal.. | 14992 | truck | 20759 | motorcycle | 78119 | |
rider | 73108 | person | 70319 | bicycle | 2573 | |
animal | 4764 | traffic sign | 9916 | train | 47 | |
trailer | 11 | traffic light | 2780 | caravan | 125 | |
total | 381516 | [0m |
[03/10 10:23:32] d2.data.common INFO: Serializing 31569 elements to byte tensors and concatenating them all ... [03/10 10:23:33] d2.data.common INFO: Serialized dataset takes 18.66 MiB [03/10 10:23:33] d2.data.detection_utils INFO: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()] [03/10 10:23:33] d2.data.build INFO: Using training sampler TrainingSampler [03/10 10:23:33] detectron2 INFO: Starting training from iteration 0 [03/10 11:34:35] d2.data.datasets.coco WARNING: Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you.
[03/10 11:34:35] d2.data.datasets.coco INFO: Loaded 10225 images in COCO format from /ssd_scratch/cvit/username/idd_data_coco/idd_val_annotation.json [03/10 11:34:36] d2.data.build INFO: Distribution of instances among all 15 categories: [36m | category | #instances | category | #instances | category | #instances |
---|---|---|---|---|---|---|
truck | 7078 | person | 18078 | motorcycle | 25489 | |
bus | 4916 | autorickshaw | 7782 | rider | 24518 | |
car | 24844 | vehicle fal.. | 6089 | traffic sign | 4287 | |
bicycle | 569 | animal | 1460 | traffic light | 919 | |
trailer | 7 | caravan | 11 | train | 13 | |
total | 126060 | [0m |
[03/10 11:34:36] d2.data.common INFO: Serializing 10225 elements to byte tensors and concatenating them all ... [03/10 11:34:36] d2.data.common INFO: Serialized dataset takes 6.09 MiB [03/10 11:34:37] d2.evaluation.evaluator INFO: Start inference on 5113 images [03/10 11:34:41] d2.evaluation.evaluator INFO: Inference done 11/5113. 0.0850 s / img. ETA=0:07:22 [03/10 11:34:46] d2.evaluation.evaluator INFO: Inference done 70/5113. 0.0839 s / img. ETA=0:07:13 [03/10 11:34:51] d2.evaluation.evaluator INFO: Inference done 129/5113. 0.0837 s / img. ETA=0:07:07 [03/10 11:34:56] d2.evaluation.evaluator INFO: Inference done 188/5113. 0.0837 s / img. ETA=0:07:02 [03/10 11:35:01] d2.evaluation.evaluator INFO: Inference done 247/5113. 0.0837 s / img. ETA=0:06:57 [03/10 11:35:06] d2.evaluation.evaluator INFO: Inference done 306/5113. 0.0837 s / img. ETA=0:06:52 [03/10 11:35:11] d2.evaluation.evaluator INFO: Inference done 365/5113. 0.0837 s / img. ETA=0:06:47 [03/10 11:35:16] d2.evaluation.evaluator INFO: Inference done 424/5113. 0.0837 s / img. ETA=0:06:42 [03/10 11:35:21] d2.evaluation.evaluator INFO: Inference done 483/5113. 0.0838 s / img. ETA=0:06:37 [03/10 11:35:26] d2.evaluation.evaluator INFO: Inference done 542/5113. 0.0837 s / img. ETA=0:06:32 [03/10 11:35:32] d2.evaluation.evaluator INFO: Inference done 601/5113. 0.0838 s / img. ETA=0:06:27 [03/10 11:35:37] d2.evaluation.evaluator INFO: Inference done 660/5113. 0.0838 s / img. ETA=0:06:22 [03/10 11:35:42] d2.evaluation.evaluator INFO: Inference done 718/5113. 0.0838 s / img. ETA=0:06:17 [03/10 11:35:47] d2.evaluation.evaluator INFO: Inference done 777/5113. 0.0838 s / img. ETA=0:06:12 [03/10 11:35:52] d2.evaluation.evaluator INFO: Inference done 836/5113. 0.0838 s / img. ETA=0:06:07 [03/10 11:35:57] d2.evaluation.evaluator INFO: Inference done 895/5113. 0.0838 s / img. ETA=0:06:02 [03/10 11:36:02] d2.evaluation.evaluator INFO: Inference done 954/5113. 0.0838 s / img. ETA=0:05:57 [03/10 11:36:07] d2.evaluation.evaluator INFO: Inference done 1012/5113. 0.0839 s / img. ETA=0:05:52 [03/10 11:36:12] d2.evaluation.evaluator INFO: Inference done 1070/5113. 0.0839 s / img. ETA=0:05:47 [03/10 11:36:17] d2.evaluation.evaluator INFO: Inference done 1127/5113. 0.0840 s / img. ETA=0:05:43 [03/10 11:36:22] d2.evaluation.evaluator INFO: Inference done 1183/5113. 0.0842 s / img. ETA=0:05:39 [03/10 11:36:27] d2.evaluation.evaluator INFO: Inference done 1239/5113. 0.0843 s / img. ETA=0:05:34 [03/10 11:36:32] d2.evaluation.evaluator INFO: Inference done 1296/5113. 0.0844 s / img. ETA=0:05:30 [03/10 11:36:37] d2.evaluation.evaluator INFO: Inference done 1353/5113. 0.0845 s / img. ETA=0:05:25 [03/10 11:36:42] d2.evaluation.evaluator INFO: Inference done 1410/5113. 0.0846 s / img. ETA=0:05:21 [03/10 11:36:47] d2.evaluation.evaluator INFO: Inference done 1467/5113. 0.0846 s / img. ETA=0:05:16 [03/10 11:36:52] d2.evaluation.evaluator INFO: Inference done 1525/5113. 0.0846 s / img. ETA=0:05:11 [03/10 11:36:57] d2.evaluation.evaluator INFO: Inference done 1582/5113. 0.0847 s / img. ETA=0:05:06 [03/10 11:37:02] d2.evaluation.evaluator INFO: Inference done 1639/5113. 0.0848 s / img. ETA=0:05:01 [03/10 11:37:07] d2.evaluation.evaluator INFO: Inference done 1696/5113. 0.0848 s / img. ETA=0:04:57 [03/10 11:37:12] d2.evaluation.evaluator INFO: Inference done 1753/5113. 0.0848 s / img. ETA=0:04:52 [03/10 11:37:18] d2.evaluation.evaluator INFO: Inference done 1810/5113. 0.0849 s / img. ETA=0:04:47 [03/10 11:37:23] d2.evaluation.evaluator INFO: Inference done 1866/5113. 0.0850 s / img. ETA=0:04:42 [03/10 11:37:28] d2.evaluation.evaluator INFO: Inference done 1922/5113. 0.0850 s / img. ETA=0:04:38 [03/10 11:37:33] d2.evaluation.evaluator INFO: Inference done 1978/5113. 0.0851 s / img. ETA=0:04:33 [03/10 11:37:38] d2.evaluation.evaluator INFO: Inference done 2035/5113. 0.0852 s / img. ETA=0:04:28 [03/10 11:37:43] d2.evaluation.evaluator INFO: Inference done 2092/5113. 0.0852 s / img. ETA=0:04:24 [03/10 11:37:48] d2.evaluation.evaluator INFO: Inference done 2148/5113. 0.0853 s / img. ETA=0:04:19 [03/10 11:37:53] d2.evaluation.evaluator INFO: Inference done 2204/5113. 0.0854 s / img. ETA=0:04:14 [03/10 11:37:58] d2.evaluation.evaluator INFO: Inference done 2260/5113. 0.0854 s / img. ETA=0:04:09 [03/10 11:38:03] d2.evaluation.evaluator INFO: Inference done 2316/5113. 0.0855 s / img. ETA=0:04:05 [03/10 11:38:08] d2.evaluation.evaluator INFO: Inference done 2372/5113. 0.0855 s / img. ETA=0:04:00 [03/10 11:38:13] d2.evaluation.evaluator INFO: Inference done 2428/5113. 0.0856 s / img. ETA=0:03:55 [03/10 11:38:18] d2.evaluation.evaluator INFO: Inference done 2484/5113. 0.0857 s / img. ETA=0:03:50 [03/10 11:38:23] d2.evaluation.evaluator INFO: Inference done 2540/5113. 0.0857 s / img. ETA=0:03:46 [03/10 11:38:28] d2.evaluation.evaluator INFO: Inference done 2596/5113. 0.0857 s / img. ETA=0:03:41 [03/10 11:38:33] d2.evaluation.evaluator INFO: Inference done 2652/5113. 0.0858 s / img. ETA=0:03:36 [03/10 11:38:38] d2.evaluation.evaluator INFO: Inference done 2708/5113. 0.0858 s / img. ETA=0:03:31 [03/10 11:38:43] d2.evaluation.evaluator INFO: Inference done 2764/5113. 0.0858 s / img. ETA=0:03:26 [03/10 11:38:48] d2.evaluation.evaluator INFO: Inference done 2820/5113. 0.0859 s / img. ETA=0:03:21 [03/10 11:38:53] d2.evaluation.evaluator INFO: Inference done 2876/5113. 0.0859 s / img. ETA=0:03:17 [03/10 11:38:58] d2.evaluation.evaluator INFO: Inference done 2932/5113. 0.0859 s / img. ETA=0:03:12 [03/10 11:39:03] d2.evaluation.evaluator INFO: Inference done 2988/5113. 0.0860 s / img. ETA=0:03:07 [03/10 11:39:08] d2.evaluation.evaluator INFO: Inference done 3044/5113. 0.0860 s / img. ETA=0:03:02 [03/10 11:39:13] d2.evaluation.evaluator INFO: Inference done 3100/5113. 0.0860 s / img. ETA=0:02:57 [03/10 11:39:18] d2.evaluation.evaluator INFO: Inference done 3156/5113. 0.0861 s / img. ETA=0:02:52 [03/10 11:39:23] d2.evaluation.evaluator INFO: Inference done 3212/5113. 0.0861 s / img. ETA=0:02:47 [03/10 11:39:29] d2.evaluation.evaluator INFO: Inference done 3268/5113. 0.0861 s / img. ETA=0:02:42 [03/10 11:39:34] d2.evaluation.evaluator INFO: Inference done 3324/5113. 0.0862 s / img. ETA=0:02:38 [03/10 11:39:39] d2.evaluation.evaluator INFO: Inference done 3380/5113. 0.0862 s / img. ETA=0:02:33 [03/10 11:39:44] d2.evaluation.evaluator INFO: Inference done 3436/5113. 0.0862 s / img. ETA=0:02:28 [03/10 11:39:49] d2.evaluation.evaluator INFO: Inference done 3492/5113. 0.0862 s / img. ETA=0:02:23 [03/10 11:39:54] d2.evaluation.evaluator INFO: Inference done 3548/5113. 0.0863 s / img. ETA=0:02:18 [03/10 11:39:59] d2.evaluation.evaluator INFO: Inference done 3605/5113. 0.0863 s / img. ETA=0:02:13 [03/10 11:40:04] d2.evaluation.evaluator INFO: Inference done 3660/5113. 0.0863 s / img. ETA=0:02:08 [03/10 11:40:09] d2.evaluation.evaluator INFO: Inference done 3715/5113. 0.0864 s / img. ETA=0:02:03 [03/10 11:40:14] d2.evaluation.evaluator INFO: Inference done 3771/5113. 0.0864 s / img. ETA=0:01:58 [03/10 11:40:19] d2.evaluation.evaluator INFO: Inference done 3827/5113. 0.0864 s / img. ETA=0:01:53 [03/10 11:40:24] d2.evaluation.evaluator INFO: Inference done 3881/5113. 0.0865 s / img. ETA=0:01:49 [03/10 11:40:29] d2.evaluation.evaluator INFO: Inference done 3936/5113. 0.0865 s / img. ETA=0:01:44 [03/10 11:40:34] d2.evaluation.evaluator INFO: Inference done 3992/5113. 0.0865 s / img. ETA=0:01:39 [03/10 11:40:39] d2.evaluation.evaluator INFO: Inference done 4048/5113. 0.0865 s / img. ETA=0:01:34 [03/10 11:40:44] d2.evaluation.evaluator INFO: Inference done 4102/5113. 0.0866 s / img. ETA=0:01:29 [03/10 11:40:49] d2.evaluation.evaluator INFO: Inference done 4157/5113. 0.0866 s / img. ETA=0:01:24 [03/10 11:40:54] d2.evaluation.evaluator INFO: Inference done 4213/5113. 0.0866 s / img. ETA=0:01:19 [03/10 11:40:59] d2.evaluation.evaluator INFO: Inference done 4269/5113. 0.0867 s / img. ETA=0:01:14 [03/10 11:41:04] d2.evaluation.evaluator INFO: Inference done 4325/5113. 0.0867 s / img. ETA=0:01:10 [03/10 11:41:09] d2.evaluation.evaluator INFO: Inference done 4381/5113. 0.0867 s / img. ETA=0:01:05 [03/10 11:41:14] d2.evaluation.evaluator INFO: Inference done 4437/5113. 0.0867 s / img. ETA=0:01:00 [03/10 11:41:19] d2.evaluation.evaluator INFO: Inference done 4493/5113. 0.0867 s / img. ETA=0:00:55 [03/10 11:41:25] d2.evaluation.evaluator INFO: Inference done 4549/5113. 0.0867 s / img. ETA=0:00:50 [03/10 11:41:30] d2.evaluation.evaluator INFO: Inference done 4605/5113. 0.0868 s / img. ETA=0:00:45 [03/10 11:41:35] d2.evaluation.evaluator INFO: Inference done 4660/5113. 0.0868 s / img. ETA=0:00:40 [03/10 11:41:40] d2.evaluation.evaluator INFO: Inference done 4716/5113. 0.0868 s / img. ETA=0:00:35 [03/10 11:41:45] d2.evaluation.evaluator INFO: Inference done 4772/5113. 0.0868 s / img. ETA=0:00:30 [03/10 11:41:50] d2.evaluation.evaluator INFO: Inference done 4828/5113. 0.0868 s / img. ETA=0:00:25 [03/10 11:41:55] d2.evaluation.evaluator INFO: Inference done 4883/5113. 0.0869 s / img. ETA=0:00:20 [03/10 11:42:00] d2.evaluation.evaluator INFO: Inference done 4939/5113. 0.0869 s / img. ETA=0:00:15 [03/10 11:42:05] d2.evaluation.evaluator INFO: Inference done 4995/5113. 0.0869 s / img. ETA=0:00:10 [03/10 11:42:10] d2.evaluation.evaluator INFO: Inference done 5051/5113. 0.0869 s / img. ETA=0:00:05 [03/10 11:42:15] d2.evaluation.evaluator INFO: Inference done 5107/5113. 0.0869 s / img. ETA=0:00:00
4. please also simplify the steps as much as possible so they do not require additional resources to
run, such as a private dataset.
Running plain_train_net.py generates two log files,
[log.txt](https://github.com/facebookresearch/detectron2/files/4311199/log.txt) and log.txt.rank1 (exactly similar to log.txt). Both are exactly similar. However, none of them contain COCO-format based evaluation results such as AP scores for each category and IoUs. To replicate, one can replicate with COCO-format dataset containing more than 1 category for evaluation.
## Expected behavior:
Get COCO-format based evaluation results such as AP scores for each category and IoUs during multi-gpu training.
## Environment:
Run `python -m detectron2.utils.collect_env` in the environment where you observerd the issue, and paste the output.
Could you share full logs? The log you provide does not seem to be complete.
Those are actually the full logs. Single GPU evaluation printed the training loss and mAP scores in its log file after the last line as seen in multi-gpu log.txt (above)
I found the next solution to this problem : in file detecton2/utils/events.py
find line 307 and replace function latest_with_smoothing_hint
on this one:
def latest_with_smoothing_hint(self, window_size=20):
"""
Similar to :meth:`latest`, but the returned values
are either the un-smoothed original latest value,
or a median of the given window_size,
depend on whether the smoothing_hint is True.
This provides a default behavior that other writers can use.
"""
result = {}
# for k, v in self._latest_scalars.items():
# result[k] = self._history[k].median(window_size) if self._smoothing_hints[k] else v
for k, v in self._history.items():
result[k] = self._history[k].median(window_size) if self._smoothing_hints[k] else v.latest()
return result
The problem is that evaluation metrics cannot be accessed via self._latest_scalars.items()
[03/10 10:23:33] detectron2 INFO: Starting training from iteration 0 [03/10 11:34:35] d2.data.datasets.coco WARNING:
Are you saying that there is one hour of nothing printed on the screen? I don't think that's what the script would do unless you made other modifications.
Sorry, didn't know you were looking for that one. In this line plain_train_net.py#L179, I am printing losses every 4510 iterations (1 epoch) instead of 20 iterations. Hence, no values were being printed. Also, this is the log-singlegpu.txt file where I am getting the training loss at 4510 iteration and evaluation results post that.
I cannot effectively investigate the issue since you seems to have written many of your own code and use your own dataset, both of which I don't have access to.
You can verify that
python tools/plain_train_net.py --num-gpus 2 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 DATALOADER.NUM_WORKERS 2 SOLVER.MAX_ITER 1000 TEST.EVAL_PERIOD 100
with no modifications to code, does run the evaluation properly.
@ppwwyyxx, I cloned the repo at this tree. After that, I ran balloon_train_net_experiment.py with the balloon dataset. Although I get the evaluation results, the code breaks due to empty results dictionary from do_test(cfg, model) method. Please look for ####### Changes ######
tag indicating parts of the code where I have made changes.
git diff
) or what code you wrote
Run balloon_train_net_experiment.py (below) on the above repo using balloon dataset.
""" Detectron2 training script with a plain training loop.
This scripts reads a given config file and runs the training or evaluation. It is an entry point that is able to train standard models in detectron2.
In order to let one script support training of many models, this script contains logic that are specific to these built-in models and therefore may not be suitable for your own project. For example, your research project perhaps only needs a single "evaluator".
Therefore, we recommend you to use detectron2 as an library and take this file as an example of how to use the library. You may want to write your own script with your datasets and other customizations.
Compared to "train_net.py", this script supports fewer default features. It also includes fewer abstraction, therefore is easier to add custom logic. """
import detectron2
import numpy as np import cv2 import random import json from detectron2.structures import BoxMode
import logging import os from collections import OrderedDict import torch from torch.nn.parallel import DistributedDataParallel
from detectron2 import model_zoo from detectron2.data.datasets import register_coco_instances from detectron2.data import MetadataCatalog, DatasetCatalog import detectron2.utils.comm as comm from detectron2.checkpoint import DetectionCheckpointer, PeriodicCheckpointer from detectron2.config import get_cfg from detectron2.data import ( MetadataCatalog, build_detection_test_loader, build_detection_train_loader, ) from detectron2.engine import default_argument_parser, default_setup, launch from detectron2.evaluation import ( CityscapesEvaluator, COCOEvaluator, COCOPanopticEvaluator, DatasetEvaluators, LVISEvaluator, PascalVOCDetectionEvaluator, SemSegEvaluator, inference_on_dataset, print_csv_format, ) from detectron2.modeling import build_model from detectron2.solver import build_lr_scheduler, build_optimizer from detectron2.utils.events import ( CommonMetricPrinter, EventStorage, JSONWriter, TensorboardXWriter, )
logger = logging.getLogger("detectron2")
def get_evaluator(cfg, dataset_name, output_folder=None): """ Create evaluator(s) for a given dataset. This uses the special metadata "evaluator_type" associated with each builtin dataset. For your own dataset, you can simply create an evaluator manually in your script and do not have to worry about the hacky if-else logic here. """ if output_folder is None: output_folder = os.path.join(cfg.OUTPUT_DIR, "inference") evaluator_list = [] evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type if evaluator_type in ["sem_seg", "coco_panoptic_seg"]: evaluator_list.append( SemSegEvaluator( dataset_name, distributed=True, num_classes=cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES, ignore_label=cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE, output_dir=output_folder, ) ) if evaluator_type in ["coco", "coco_panoptic_seg"]: evaluator_list.append(COCOEvaluator(dataset_name, cfg, True, output_folder)) if evaluator_type == "coco_panoptic_seg": evaluator_list.append(COCOPanopticEvaluator(dataset_name, output_folder)) if evaluator_type == "cityscapes": assert ( torch.cuda.device_count() >= comm.get_rank() ), "CityscapesEvaluator currently do not work with multiple machines." return CityscapesEvaluator(dataset_name) if evaluator_type == "pascal_voc": return PascalVOCDetectionEvaluator(dataset_name) if evaluator_type == "lvis": return LVISEvaluator(dataset_name, cfg, True, output_folder) if len(evaluator_list) == 0: raise NotImplementedError( "no Evaluator for the dataset {} with the type {}".format(dataset_name, evaluator_type) ) if len(evaluator_list) == 1: return evaluator_list[0] return DatasetEvaluators(evaluator_list)
def do_test(cfg, model): results = OrderedDict() for dataset_name in cfg.DATASETS.TEST: data_loader = build_detection_test_loader(cfg, dataset_name) evaluator = get_evaluator( cfg, dataset_name, os.path.join(cfg.OUTPUT_DIR, "inference", dataset_name) ) results_i = inference_on_dataset(model, data_loader, evaluator) results[dataset_name] = results_i
####### Changes (Print statements to debug) ########
print("Before ", comm.get_rank())
if comm.is_main_process():
print("First ", comm.get_rank())
logger.info("Evaluation results for {} in csv format:".format(dataset_name))
print_csv_format(results_i)
print("Second ", comm.get_rank())
if len(results) == 1:
results = list(results.values())[0]
if results == {} or results_i == {}:
print("Third ", comm.get_rank())
return results
def do_train(cfg, model, resume=False):
####### Changes ########
default_val_AP = 0
default_val_AP50 = 0
default_val_AP75 = 0
best_model_dict = {}
model.train()
optimizer = build_optimizer(cfg, model)
scheduler = build_lr_scheduler(cfg, optimizer)
checkpointer = DetectionCheckpointer(
model, cfg.OUTPUT_DIR, optimizer=optimizer, scheduler=scheduler
)
start_iter = (
checkpointer.resume_or_load(cfg.MODEL.WEIGHTS, resume=resume).get("iteration", -1) + 1
)
max_iter = cfg.SOLVER.MAX_ITER
periodic_checkpointer = PeriodicCheckpointer(
checkpointer, cfg.SOLVER.CHECKPOINT_PERIOD, max_iter=max_iter
)
####### Changes ########
writers = (
[
JSONWriter(os.path.join(cfg.OUTPUT_DIR, "metrics.json")),
TensorboardXWriter(cfg.OUTPUT_DIR),
]
if comm.is_main_process()
else []
)
####### Changes ########
terminal_writer = ( [CommonMetricPrinter(max_iter)]
if comm.is_main_process()
else [] )
# compared to "train_net.py", we do not support accurate timing and
# precise BN here, because they are not trivial to implement
data_loader = build_detection_train_loader(cfg)
logger.info("Starting training from iteration {}".format(start_iter))
with EventStorage(start_iter) as storage:
for data, iteration in zip(data_loader, range(start_iter, max_iter)):
iteration = iteration + 1
storage.step()
loss_dict = model(data)
losses = sum(loss for loss in loss_dict.values())
assert torch.isfinite(losses).all(), loss_dict
loss_dict_reduced = {k: v.item() for k, v in comm.reduce_dict(loss_dict).items()}
losses_reduced = sum(loss for loss in loss_dict_reduced.values())
if comm.is_main_process():
storage.put_scalars(total_loss=losses_reduced, **loss_dict_reduced)
optimizer.zero_grad()
losses.backward()
optimizer.step()
storage.put_scalar("lr", optimizer.param_groups[0]["lr"], smoothing_hint=False)
scheduler.step()
if (
cfg.TEST.EVAL_PERIOD > 0
and iteration % cfg.TEST.EVAL_PERIOD == 0
and iteration != max_iter
):
val_dict = do_test(cfg, model)
####### Changes (save best model) ########
print("Val dict value ", val_dict)
if (val_dict['bbox']['AP'] > default_val_AP and val_dict['bbox']['AP50'] > default_val_AP50 and val_dict['bbox']['AP75'] > default_val_AP75):
default_val_AP = val_dict['bbox']['AP']
default_val_AP50 = val_dict['bbox']['AP50']
default_val_AP75 = val_dict['bbox']['AP75']
best_model_dict = {}
best_model_dict["model"] = model.state_dict()
best_model_dict["optimizer"] = optimizer.state_dict()
best_model_dict["scheduler"] = scheduler.state_dict()
torch.save(best_model_dict, cfg.OUTPUT_DIR+'model_dict'+str(iteration)+'.pth')
# Compared to "train_net.py", the test results are not dumped to EventStorage
comm.synchronize()
####### Changes (log values to tensorboard and json every 400th iteration) ########
if iteration - start_iter > 5 and (iteration % 400 == 0 or iteration == max_iter):
for writer in writers:
writer.write()
####### Changes (print on terminal every 20th iteration) ########
if iteration - start_iter > 5 and (iteration % 20 == 0 or iteration == max_iter):
for wrt in terminal_writer:
wrt.write()
periodic_checkpointer.step(iteration)
def get_balloon_dicts(img_dir): json_file = os.path.join(img_dir, "via_region_data.json") with open(json_file) as f: imgs_anns = json.load(f)
dataset_dicts = []
for idx, v in enumerate(imgs_anns.values()):
record = {}
filename = os.path.join(img_dir, v["filename"])
height, width = cv2.imread(filename).shape[:2]
record["file_name"] = filename
record["image_id"] = idx
record["height"] = height
record["width"] = width
annos = v["regions"]
objs = []
for _, anno in annos.items():
assert not anno["region_attributes"]
anno = anno["shape_attributes"]
px = anno["all_points_x"]
py = anno["all_points_y"]
poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
poly = [p for x in poly for p in x]
obj = {
"bbox": [np.min(px), np.min(py), np.max(px), np.max(py)],
"bbox_mode": BoxMode.XYXY_ABS,
"segmentation": [poly],
"category_id": 0,
"iscrowd": 0
}
objs.append(obj)
record["annotations"] = objs
dataset_dicts.append(record)
return dataset_dicts
def setup(args): """ Create configs and perform basic setups. """
for d in ["train", "val"]:
DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d))
MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"])
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("Misc/cascade_mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("balloon_train",)
cfg.MODEL.MASK_ON = False
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 # set the testing threshold for this model
cfg.DATASETS.TEST = ("balloon_val",)
MetadataCatalog.get("balloon_val").evaluator_type = "coco"
#cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 1
cfg.OUTPUT_DIR = '/balloon/models/'
cfg.MODEL.WEIGHTS = "/balloon/model_final_480dd8.pkl"
cfg.SOLVER.IMS_PER_BATCH = 14
cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR
cfg.SOLVER.MAX_ITER = 1200
cfg.TEST.EVAL_PERIOD = 400
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (ballon)
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
cfg.merge_from_list(args.opts)
cfg.freeze()
default_setup(
cfg, args
) # if you don't like any of the default setup, write your own setup code
return cfg
def main(args):
cfg = setup(args)
model = build_model(cfg)
logger.info("Model:\n{}".format(model))
if args.eval_only:
DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
cfg.MODEL.WEIGHTS, resume=args.resume
)
return do_test(cfg, model)
distributed = comm.get_world_size() > 1
if distributed:
model = DistributedDataParallel(
model, device_ids=[comm.get_local_rank()], broadcast_buffers=False
)
do_train(cfg, model)
print("Done")
if name == "main": args = default_argument_parser().parse_args() print("Command Line Args:", args) launch( main, args.num_gpus, num_machines=args.num_machines, machine_rank=args.machine_rank, dist_url=args.dist_url, args=(args,), )
2. what exact command you run: python \path\balloon_train_net_experiment.py --num-gpus=2 --dist-url="auto"
3. what you observed (including __full logs__):
Obtaining file:///home/username/detectron2_repo_trial Requirement already satisfied: termcolor>=1.1 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (1.1.0) Requirement already satisfied: Pillow in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (6.2.2) Requirement already satisfied: yacs>=0.1.6 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (0.1.6) Requirement already satisfied: tabulate in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (0.8.6) Requirement already satisfied: cloudpickle in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (1.2.2) Requirement already satisfied: matplotlib in ./miniconda3/envs/det_trial/lib/python3.7/site-packages/matplotlib-3.2.0rc1-py3.7-linux-x86_64.egg (from detectron2==0.1.1) (3.2.0rc1) Requirement already satisfied: tqdm>4.29.0 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (4.41.1) Requirement already satisfied: tensorboard in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (2.1.0) Requirement already satisfied: fvcore in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (0.1.dev200114) Requirement already satisfied: future in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (0.18.2) Requirement already satisfied: pydot in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from detectron2==0.1.1) (1.4.1) Requirement already satisfied: PyYAML in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from yacs>=0.1.6->detectron2==0.1.1) (5.1) Requirement already satisfied: cycler>=0.10 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages/cycler-0.10.0-py3.7.egg (from matplotlib->detectron2==0.1.1) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages/kiwisolver-1.1.0-py3.7-linux-x86_64.egg (from matplotlib->detectron2==0.1.1) (1.1.0) Requirement already satisfied: numpy>=1.11 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from matplotlib->detectron2==0.1.1) (1.18.1) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages/pyparsing-2.4.6-py3.7.egg (from matplotlib->detectron2==0.1.1) (2.4.6) Requirement already satisfied: python-dateutil>=2.1 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages/python_dateutil-2.8.1-py3.7.egg (from matplotlib->detectron2==0.1.1) (2.8.1) Requirement already satisfied: setuptools>=41.0.0 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (44.0.0.post20200106) Requirement already satisfied: markdown>=2.6.8 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (3.1.1) Requirement already satisfied: requests<3,>=2.21.0 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (2.22.0) Requirement already satisfied: wheel>=0.26; python_version >= "3" in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (0.33.6) Requirement already satisfied: google-auth<2,>=1.6.3 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (1.11.0) Requirement already satisfied: absl-py>=0.4 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (0.9.0) Requirement already satisfied: grpcio>=1.24.3 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (1.26.0) Requirement already satisfied: werkzeug>=0.11.15 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (0.16.0) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (0.4.1) Requirement already satisfied: protobuf>=3.6.0 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (3.11.2) Requirement already satisfied: six>=1.10.0 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from tensorboard->detectron2==0.1.1) (1.14.0) Requirement already satisfied: portalocker in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from fvcore->detectron2==0.1.1) (1.5.2) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (3.0.4) Requirement already satisfied: idna<2.9,>=2.5 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (2.8) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (1.25.8) Requirement already satisfied: certifi>=2017.4.17 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from requests<3,>=2.21.0->tensorboard->detectron2==0.1.1) (2019.11.28) Requirement already satisfied: pyasn1-modules>=0.2.1 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (0.2.8) Requirement already satisfied: rsa<4.1,>=3.1.4 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (4.0) Requirement already satisfied: cachetools<5.0,>=2.0.0 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (4.0.0) Requirement already satisfied: requests-oauthlib>=0.7.0 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->detectron2==0.1.1) (1.3.0) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard->detectron2==0.1.1) (0.4.8) Requirement already satisfied: oauthlib>=3.0.0 in ./miniconda3/envs/det_trial/lib/python3.7/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->detectron2==0.1.1) (3.1.0) Installing collected packages: detectron2 Found existing installation: detectron2 0.1.1 Uninstalling detectron2-0.1.1: Successfully uninstalled detectron2-0.1.1 Running setup.py develop for detectron2 Successfully installed detectron2 Command Line Args: Namespace(config_file='', dist_url='auto', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False) [32m[03/11 21:46:11 detectron2]: [0mRank of current process: 0. World size: 2 [32m[03/11 21:46:15 detectron2]: [0mEnvironment info:
sys.platform linux
Python 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
numpy 1.18.1
detectron2 0.1.1 @/home/username/detectron2_repo_trial/detectron2
detectron2 compiler GCC 5.5
detectron2 CUDA compiler 10.2
detectron2 arch flags sm_61
DETECTRON2_ENV_MODULE
PyTorch built with:
[32m[03/11 21:46:15 detectron2]: [0mCommand line arguments: Namespace(config_file='', dist_url='auto', eval_only=False, machine_rank=0, num_gpus=2, num_machines=1, opts=[], resume=False) [32m[03/11 21:46:15 detectron2]: [0mRunning with full config: CUDNN_BENCHMARK: False DATALOADER: ASPECT_RATIO_GROUPING: True FILTER_EMPTY_ANNOTATIONS: True NUM_WORKERS: 2 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: () PROPOSAL_FILES_TRAIN: () TEST: ('balloon_val',) TRAIN: ('balloon_train',) GLOBAL: HACK: 1.0 INPUT: CROP: ENABLED: False SIZE: [0.9, 0.9] TYPE: relative_range FORMAT: BGR MASK_FORMAT: polygon MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) MIN_SIZE_TRAIN_SAMPLING: choice MODEL: ANCHOR_GENERATOR: ANGLES: [[-90, 0, 90]] ASPECT_RATIOS: [[0.5, 1.0, 2.0]] NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES: [[32], [64], [128], [256], [512]] BACKBONE: FREEZE_AT: 2 NAME: build_resnet_fpn_backbone DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES: ['res2', 'res3', 'res4', 'res5'] NORM: OUT_CHANNELS: 256 KEYPOINT_ON: False LOAD_PROPOSALS: False MASK_ON: False META_ARCHITECTURE: GeneralizedRCNN PANOPTIC_FPN: COMBINE: ENABLED: True INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN: [103.53, 116.28, 123.675] PIXEL_STD: [1.0, 1.0, 1.0] PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: False DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE: [False, False, False, False] DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES: ['res2', 'res3', 'res4', 'res5'] RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: True WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0) FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7'] IOU_LABELS: [0, -1, 1] IOU_THRESHOLDS: [0.4, 0.5] NMS_THRESH_TEST: 0.5 NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0)) IOUS: (0.5, 0.6, 0.7) ROI_BOX_HEAD: BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0) CLS_AGNOSTIC_BBOX_REG: True CONV_DIM: 256 FC_DIM: 1024 NAME: FastRCNNConvFCHead NORM: NUM_CONV: 0 NUM_FC: 2 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: False ROI_HEADS: BATCH_SIZE_PER_IMAGE: 128 IN_FEATURES: ['p2', 'p3', 'p4', 'p5'] IOU_LABELS: [0, 1] IOU_THRESHOLDS: [0.5] NAME: CascadeROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 1 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: True SCORE_THRESH_TEST: 0.7 ROI_KEYPOINT_HEAD: CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512) LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: False CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: NUM_CONV: 4 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0) BOUNDARY_THRESH: -1 HEAD_NAME: StandardRPNHead IN_FEATURES: ['p2', 'p3', 'p4', 'p5', 'p6'] IOU_LABELS: [0, -1, 1] IOU_THRESHOLDS: [0.3, 0.7] LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 2000 PRE_NMS_TOPK_TEST: 1000 PRE_NMS_TOPK_TRAIN: 2000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES: ['p2', 'p3', 'p4', 'p5'] LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 WEIGHTS: /ssd_scratch/cvit/username/balloon/model_final_480dd8.pkl OUTPUT_DIR: /ssd_scratch/cvit/username/balloon/models/ SEED: -1 SOLVER: BASE_LR: 0.00025 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 5000 CLIP_GRADIENTS: CLIP_TYPE: value CLIP_VALUE: 1.0 ENABLED: False NORM_TYPE: 2.0 GAMMA: 0.1 IMS_PER_BATCH: 14 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 1200 MOMENTUM: 0.9 STEPS: (210000, 250000) WARMUP_FACTOR: 0.001 WARMUP_ITERS: 1000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0.0001 WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: False FLIP: True MAX_SIZE: 4000 MIN_SIZES: (400, 500, 600, 700, 800, 900, 1000, 1100, 1200) DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 400 EXPECTED_RESULTS: [] KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: False NUM_ITER: 200 VERSION: 2 VIS_PERIOD: 0 [32m[03/11 21:46:15 detectron2]: [0mFull config saved to /ssd_scratch/cvit/username/balloon/models/config.yaml [32m[03/11 21:46:15 d2.utils.env]: [0mUsing a generated random seed 15491967 [32m[03/11 21:46:16 detectron2]: [0mModel: GeneralizedRCNN( (backbone): FPN( (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (top_block): LastLevelMaxPool() (bottom_up): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) ) (res2): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv1): Conv2d( 64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) ) (res3): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv1): Conv2d( 256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) ) (res4): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) (conv1): Conv2d( 512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (4): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (5): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) ) (res5): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) (conv1): Conv2d( 1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) ) ) ) (proposal_generator): RPN( (anchor_generator): DefaultAnchorGenerator( (cell_anchors): BufferList() ) (rpn_head): StandardRPNHead( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1)) (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1)) ) ) (roi_heads): CascadeROIHeads( (box_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True) (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True) (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True) (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True) ) ) (box_head): ModuleList( (0): FastRCNNConvFCHead( (fc1): Linear(in_features=12544, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=1024, bias=True) ) (1): FastRCNNConvFCHead( (fc1): Linear(in_features=12544, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=1024, bias=True) ) (2): FastRCNNConvFCHead( (fc1): Linear(in_features=12544, out_features=1024, bias=True) (fc2): Linear(in_features=1024, out_features=1024, bias=True) ) ) (box_predictor): ModuleList( (0): FastRCNNOutputLayers( (cls_score): Linear(in_features=1024, out_features=2, bias=True) (bbox_pred): Linear(in_features=1024, out_features=4, bias=True) ) (1): FastRCNNOutputLayers( (cls_score): Linear(in_features=1024, out_features=2, bias=True) (bbox_pred): Linear(in_features=1024, out_features=4, bias=True) ) (2): FastRCNNOutputLayers( (cls_score): Linear(in_features=1024, out_features=2, bias=True) (bbox_pred): Linear(in_features=1024, out_features=4, bias=True) ) ) ) ) [32m[03/11 21:46:16 fvcore.common.checkpoint]: [0mLoading checkpoint from /ssd_scratch/cvit/username/balloon/model_final_480dd8.pkl [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0mReading a file from 'Detectron2 Model Zoo' [5m[31mWARNING[0m [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0m'roi_heads.box_predictor.0.cls_score.weight' has shape (81, 1024) in the checkpoint but (2, 1024) in the model! Skipped. [5m[31mWARNING[0m [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0m'roi_heads.box_predictor.0.cls_score.bias' has shape (81,) in the checkpoint but (2,) in the model! Skipped. [5m[31mWARNING[0m [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0m'roi_heads.box_predictor.1.cls_score.weight' has shape (81, 1024) in the checkpoint but (2, 1024) in the model! Skipped. [5m[31mWARNING[0m [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0m'roi_heads.box_predictor.1.cls_score.bias' has shape (81,) in the checkpoint but (2,) in the model! Skipped. [5m[31mWARNING[0m [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0m'roi_heads.box_predictor.2.cls_score.weight' has shape (81, 1024) in the checkpoint but (2, 1024) in the model! Skipped. [5m[31mWARNING[0m [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0m'roi_heads.box_predictor.2.cls_score.bias' has shape (81,) in the checkpoint but (2,) in the model! Skipped. [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0mSome model parameters are not in the checkpoint: [34mroi_heads.box_predictor.0.cls_score.{weight, bias}[0m [34mroi_heads.box_predictor.1.cls_score.{weight, bias}[0m [34mroi_heads.box_predictor.2.cls_score.{weight, bias}[0m [32m[03/11 21:46:17 fvcore.common.checkpoint]: [0mThe checkpoint contains parameters not used by the model: [35mroi_heads.mask_head.mask_fcn1.{weight, bias}[0m [35mroi_heads.mask_head.mask_fcn2.{weight, bias}[0m [35mroi_heads.mask_head.mask_fcn3.{weight, bias}[0m [35mroi_heads.mask_head.mask_fcn4.{weight, bias}[0m [35mroi_heads.mask_head.deconv.{weight, bias}[0m [35mroi_heads.mask_head.predictor.{weight, bias}[0m [32m[03/11 21:46:22 d2.data.build]: [0mRemoved 0 images with no usable annotations. 61 images left. [32m[03/11 21:46:22 d2.data.build]: [0mDistribution of instances among all 1 categories: [36m | category | #instances |
---|---|---|
balloon | 255 | |
[0m |
[32m[03/11 21:46:22 d2.data.common]: [0mSerializing 61 elements to byte tensors and concatenating them all ... [32m[03/11 21:46:22 d2.data.common]: [0mSerialized dataset takes 0.18 MiB [32m[03/11 21:46:22 d2.data.detection_utils]: [0mTransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()] [32m[03/11 21:46:22 d2.data.build]: [0mUsing training sampler TrainingSampler [32m[03/11 21:46:22 detectron2]: [0mStarting training from iteration 0 [32m[03/11 21:46:52 d2.utils.events]: [0m eta: N/A iter: 20 total_loss: 3.262 loss_box_reg_stage0: 0.195 loss_box_reg_stage1: 0.309 loss_box_reg_stage2: 0.389 loss_cls_stage0: 0.796 loss_cls_stage1: 0.744 loss_cls_stage2: 0.762 loss_rpn_cls: 0.043 loss_rpn_loc: 0.014 lr: 0.000005 max_mem: 8398M [32m[03/11 21:47:15 d2.utils.events]: [0m eta: N/A iter: 40 total_loss: 3.172 loss_box_reg_stage0: 0.191 loss_box_reg_stage1: 0.309 loss_box_reg_stage2: 0.400 loss_cls_stage0: 0.765 loss_cls_stage1: 0.717 loss_cls_stage2: 0.742 loss_rpn_cls: 0.045 loss_rpn_loc: 0.015 lr: 0.000010 max_mem: 8517M [32m[03/11 21:47:38 d2.utils.events]: [0m eta: N/A iter: 60 total_loss: 2.940 loss_box_reg_stage0: 0.185 loss_box_reg_stage1: 0.299 loss_box_reg_stage2: 0.388 loss_cls_stage0: 0.698 loss_cls_stage1: 0.670 loss_cls_stage2: 0.692 loss_rpn_cls: 0.047 loss_rpn_loc: 0.014 lr: 0.000015 max_mem: 8517M [32m[03/11 21:48:01 d2.utils.events]: [0m eta: N/A iter: 80 total_loss: 2.749 loss_box_reg_stage0: 0.187 loss_box_reg_stage1: 0.283 loss_box_reg_stage2: 0.384 loss_cls_stage0: 0.621 loss_cls_stage1: 0.602 loss_cls_stage2: 0.613 loss_rpn_cls: 0.035 loss_rpn_loc: 0.016 lr: 0.000020 max_mem: 8517M [32m[03/11 21:48:24 d2.utils.events]: [0m eta: N/A iter: 100 total_loss: 2.526 loss_box_reg_stage0: 0.172 loss_box_reg_stage1: 0.278 loss_box_reg_stage2: 0.360 loss_cls_stage0: 0.552 loss_cls_stage1: 0.540 loss_cls_stage2: 0.553 loss_rpn_cls: 0.040 loss_rpn_loc: 0.014 lr: 0.000025 max_mem: 8517M [32m[03/11 21:48:47 d2.utils.events]: [0m eta: N/A iter: 120 total_loss: 2.308 loss_box_reg_stage0: 0.165 loss_box_reg_stage1: 0.286 loss_box_reg_stage2: 0.360 loss_cls_stage0: 0.489 loss_cls_stage1: 0.477 loss_cls_stage2: 0.497 loss_rpn_cls: 0.029 loss_rpn_loc: 0.013 lr: 0.000030 max_mem: 8517M [32m[03/11 21:49:10 d2.utils.events]: [0m eta: N/A iter: 140 total_loss: 2.145 loss_box_reg_stage0: 0.152 loss_box_reg_stage1: 0.272 loss_box_reg_stage2: 0.361 loss_cls_stage0: 0.437 loss_cls_stage1: 0.427 loss_cls_stage2: 0.441 loss_rpn_cls: 0.034 loss_rpn_loc: 0.013 lr: 0.000035 max_mem: 8576M [32m[03/11 21:49:33 d2.utils.events]: [0m eta: N/A iter: 160 total_loss: 1.995 loss_box_reg_stage0: 0.154 loss_box_reg_stage1: 0.268 loss_box_reg_stage2: 0.369 loss_cls_stage0: 0.396 loss_cls_stage1: 0.381 loss_cls_stage2: 0.398 loss_rpn_cls: 0.028 loss_rpn_loc: 0.012 lr: 0.000040 max_mem: 8576M [32m[03/11 21:49:55 d2.utils.events]: [0m eta: N/A iter: 180 total_loss: 1.831 loss_box_reg_stage0: 0.151 loss_box_reg_stage1: 0.257 loss_box_reg_stage2: 0.374 loss_cls_stage0: 0.357 loss_cls_stage1: 0.339 loss_cls_stage2: 0.355 loss_rpn_cls: 0.026 loss_rpn_loc: 0.011 lr: 0.000045 max_mem: 8576M [32m[03/11 21:50:18 d2.utils.events]: [0m eta: N/A iter: 200 total_loss: 1.779 loss_box_reg_stage0: 0.153 loss_box_reg_stage1: 0.256 loss_box_reg_stage2: 0.359 loss_cls_stage0: 0.321 loss_cls_stage1: 0.302 loss_cls_stage2: 0.317 loss_rpn_cls: 0.030 loss_rpn_loc: 0.013 lr: 0.000050 max_mem: 8576M [32m[03/11 21:50:41 d2.utils.events]: [0m eta: N/A iter: 220 total_loss: 1.564 loss_box_reg_stage0: 0.137 loss_box_reg_stage1: 0.224 loss_box_reg_stage2: 0.325 loss_cls_stage0: 0.291 loss_cls_stage1: 0.269 loss_cls_stage2: 0.287 loss_rpn_cls: 0.024 loss_rpn_loc: 0.012 lr: 0.000055 max_mem: 8576M [32m[03/11 21:51:04 d2.utils.events]: [0m eta: N/A iter: 240 total_loss: 1.571 loss_box_reg_stage0: 0.149 loss_box_reg_stage1: 0.247 loss_box_reg_stage2: 0.361 loss_cls_stage0: 0.270 loss_cls_stage1: 0.246 loss_cls_stage2: 0.262 loss_rpn_cls: 0.026 loss_rpn_loc: 0.012 lr: 0.000060 max_mem: 8576M [32m[03/11 21:51:27 d2.utils.events]: [0m eta: N/A iter: 260 total_loss: 1.429 loss_box_reg_stage0: 0.140 loss_box_reg_stage1: 0.234 loss_box_reg_stage2: 0.362 loss_cls_stage0: 0.240 loss_cls_stage1: 0.214 loss_cls_stage2: 0.230 loss_rpn_cls: 0.025 loss_rpn_loc: 0.012 lr: 0.000065 max_mem: 8576M [32m[03/11 21:51:50 d2.utils.events]: [0m eta: N/A iter: 280 total_loss: 1.324 loss_box_reg_stage0: 0.133 loss_box_reg_stage1: 0.218 loss_box_reg_stage2: 0.312 loss_cls_stage0: 0.220 loss_cls_stage1: 0.191 loss_cls_stage2: 0.207 loss_rpn_cls: 0.022 loss_rpn_loc: 0.011 lr: 0.000070 max_mem: 8576M [32m[03/11 21:52:12 d2.utils.events]: [0m eta: N/A iter: 300 total_loss: 1.313 loss_box_reg_stage0: 0.137 loss_box_reg_stage1: 0.248 loss_box_reg_stage2: 0.345 loss_cls_stage0: 0.199 loss_cls_stage1: 0.170 loss_cls_stage2: 0.183 loss_rpn_cls: 0.022 loss_rpn_loc: 0.011 lr: 0.000075 max_mem: 8576M [32m[03/11 21:52:35 d2.utils.events]: [0m eta: N/A iter: 320 total_loss: 1.197 loss_box_reg_stage0: 0.128 loss_box_reg_stage1: 0.222 loss_box_reg_stage2: 0.309 loss_cls_stage0: 0.182 loss_cls_stage1: 0.153 loss_cls_stage2: 0.163 loss_rpn_cls: 0.017 loss_rpn_loc: 0.010 lr: 0.000080 max_mem: 8576M [32m[03/11 21:52:58 d2.utils.events]: [0m eta: N/A iter: 340 total_loss: 1.174 loss_box_reg_stage0: 0.135 loss_box_reg_stage1: 0.216 loss_box_reg_stage2: 0.314 loss_cls_stage0: 0.169 loss_cls_stage1: 0.142 loss_cls_stage2: 0.152 loss_rpn_cls: 0.018 loss_rpn_loc: 0.011 lr: 0.000085 max_mem: 8576M [32m[03/11 21:53:21 d2.utils.events]: [0m eta: N/A iter: 360 total_loss: 1.101 loss_box_reg_stage0: 0.131 loss_box_reg_stage1: 0.209 loss_box_reg_stage2: 0.299 loss_cls_stage0: 0.155 loss_cls_stage1: 0.123 loss_cls_stage2: 0.137 loss_rpn_cls: 0.022 loss_rpn_loc: 0.011 lr: 0.000090 max_mem: 8576M [32m[03/11 21:53:44 d2.utils.events]: [0m eta: N/A iter: 380 total_loss: 1.046 loss_box_reg_stage0: 0.127 loss_box_reg_stage1: 0.202 loss_box_reg_stage2: 0.300 loss_cls_stage0: 0.147 loss_cls_stage1: 0.120 loss_cls_stage2: 0.127 loss_rpn_cls: 0.018 loss_rpn_loc: 0.011 lr: 0.000095 max_mem: 8576M [32m[03/11 21:54:06 d2.data.build]: [0mDistribution of instances among all 1 categories: [36m | category | #instances |
---|---|---|
balloon | 50 | |
[0m |
[32m[03/11 21:54:06 d2.data.common]: [0mSerializing 13 elements to byte tensors and concatenating them all ... [32m[03/11 21:54:06 d2.data.common]: [0mSerialized dataset takes 0.04 MiB [5m[31mWARNING[0m [32m[03/11 21:54:06 d2.evaluation.coco_evaluation]: [0mjson_file was not found in MetaDataCatalog for 'balloon_val'. Trying to convert it to COCO format ... [5m[31mWARNING[0m [32m[03/11 21:54:07 d2.data.datasets.coco]: [0mUsing previously cached COCO format annotations at '/ssd_scratch/cvit/username/balloon/models/inference/balloon_val/balloon_val_coco_format.json'. You need to clear the cache file if your dataset has been modified. [32m[03/11 21:54:07 d2.evaluation.evaluator]: [0mStart inference on 7 images [32m[03/11 21:54:17 d2.evaluation.evaluator]: [0mTotal inference time: 0:00:00.353311 (0.176655 s / img per device, on 2 devices) [32m[03/11 21:54:17 d2.evaluation.evaluator]: [0mTotal inference pure compute time: 0:00:00 (0.092158 s / img per device, on 2 devices) [32m[03/11 21:54:17 d2.evaluation.coco_evaluation]: [0mPreparing results for COCO format ... [32m[03/11 21:54:17 d2.evaluation.coco_evaluation]: [0mSaving results to /ssd_scratch/cvit/username/balloon/models/inference/balloon_val/coco_instances_results.json [32m[03/11 21:54:17 d2.evaluation.coco_evaluation]: [0mEvaluating predictions ... Loading and preparing results... DONE (t=0.00s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=0.01s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.722 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.778 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.778 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.909 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.254 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.730 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.730 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.524 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.920 [32m[03/11 21:54:17 d2.evaluation.coco_evaluation]: [0mEvaluation results for bbox: | AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
72.213 | 77.848 | 77.848 | 0.000 | 49.406 | 90.933 |
Before 0
First 0
[32m[03/11 21:54:17 detectron2]: [0mEvaluation results for balloon_val in csv format:
[32m[03/11 21:54:17 d2.evaluation.testing]: [0mcopypaste: Task: bbox
[32m[03/11 21:54:17 d2.evaluation.testing]: [0mcopypaste: AP,AP50,AP75,APs,APm,APl
[32m[03/11 21:54:17 d2.evaluation.testing]: [0mcopypaste: 72.2132,77.8478,77.8478,0.0000,49.4059,90.9329
Before 1
Second 1
Third 1
{}
Traceback (most recent call last):
File "/home/username/detectron2_repo_trial/tools/balloon_train_net_experiment.py", line 344, in
-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/username/miniconda3/envs/det_trial/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/home/username/detectron2_repo_trial/detectron2/engine/launch.py", line 84, in _distributed_worker main_func(args) File "/home/username/detectron2_repo_trial/tools/balloon_train_net_experiment.py", line 332, in main do_train(cfg, model) File "/home/username/detectron2_repo_trial/tools/balloon_train_net_experiment.py", line 212, in do_train if (val_dict['bbox']['AP'] > default_val_AP and val_dict['bbox']['AP50'] > default_val_AP50 and val_dict['bbox']['AP75'] > default_val_AP75): KeyError: 'bbox'
/home/username/miniconda3/envs/det_trial/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown len(cache))
## Expected behavior:
do_test being executed only once with the evaluation results being printed for every eval period during multi-gpu training.
## Environment:
Provide your environment information using the following command:
wget -nc -q https://github.com/facebookresearch/detectron2/raw/master/detectron2/utils/collect_env.py && python collect_env.py
That's because only one GPU does evaluation and it's by design.
You need if comm.is_main_process():
Closing as the issue is solved.
So is there a way where I can have results dict to be not {} and run through do_test only once?
I don't understand that question. There is no need to have the same evaluation results on other GPUs.
Correct. But in my case, it does go through do_test one more time? What can I do to run do_test on only one GPU?
All GPUs have to go through do_test
to make predictions together. Only one GPU evaluates the predictions.
My bad! Since I am using 2 GPUs, and each of them is going through do_test once, in the second run results_i = inference_on_dataset(model, data_loader, evaluator)
returns null (or {}). Is there a way to handle this?
I don't know how exactly do you want to handle this, but you can use if comm.is_main_process():
as I said above.
I am training an object detection on a custom COCO-format dataset. While multi-gpu training, I periodically do evaluation using the cfg.TEST.EVAL_PERIOD. However, I don't get any evaluation results, such as mAP scores, or per-category AP scores. This issue is similar to #937 with the only difference that no evaluation results. How can I get the evaluation results?