facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Apache License 2.0
29.3k stars 7.32k forks source link

AttributeError: Cannot find field 'gt_masks' in the given Instances! #5276

Open stoic-signs opened 1 month ago

stoic-signs commented 1 month ago

If you do not know the root cause of the problem, please post according to this template:

Instructions To Reproduce the Issue:

Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:

  1. Full runnable code or full changes you made: Here's my model config, based on the mask rcnn config provided, with the mask heads removed:
    model = L(GeneralizedRCNN)(
            stem=L(BasicStem)(in_channels=3, out_channels=64, norm="FrozenBN"),
            out_features=["res2", "res3", "res4", "res5"],
        in_features=["p2", "p3", "p4", "p5", "p6"],
        head=L(StandardRPNHead)(in_channels=256, num_anchors=3),
            sizes=[[32], [64], [128], [256], [512]],
            aspect_ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64],
            thresholds=[0.3, 0.7], labels=[0, -1, 1], allow_low_quality_matches=True
        box2box_transform=L(Box2BoxTransform)(weights=[1.0, 1.0, 1.0, 1.0]),
        pre_nms_topk=(2000, 1000),
        post_nms_topk=(1000, 1000),
            thresholds=[0.5], labels=[0, 1], allow_low_quality_matches=False
        box_in_features=["p2", "p3", "p4", "p5"],
            scales=(1.0 / 4, 1.0 / 8, 1.0 / 16, 1.0 / 32),
            input_shape=ShapeSpec(channels=256, height=7, width=7),
            fc_dims=[1024, 1024],
            box2box_transform=L(Box2BoxTransform)(weights=(10, 10, 5, 5)),
    model.pixel_mean = [123.675, 116.28, 103.53]
    model.pixel_std = [58.395, 57.12, 57.375]
    model.input_format = "RGB"

model.roi_heads.num_classes = len(model_classes)

train = model_zoo.get_config("common/train.py").train train.amp.enabled = True train.ddp.fp16_compression = True train.init_checkpoint = "detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl" dataloader = model_zoo.get_config("common/data/coco.py").dataloader dataloader.train.mapper.augmentations = [ L(T.RandomFlip)(horizontal=True), # flip first L(T.RandomApply)(tfm_or_aug=L(T.RandomBrightness)(intensity_min=0.5,intensity_max=1.5),prob=0.3), L(T.RandomApply)(tfm_or_aug=L(T.RandomCrop)(crop_type='relative_range',crop_size=[0.7,0.7]),prob=0.4), L(T.ResizeShortestEdge)(short_edge_length=min_edge_range, sample_style="range",max_size=max_size) ] dataloader.train.mapper.image_format = "RGB"

recompute boxes due to cropping

dataloader.train.mapper.recompute_boxes = True

dataloader.test.mapper.augmentations = [ L(T.ResizeShortestEdge)(short_edge_length=min_edge_range[1], max_size=max_size), ] dataloader.test.mapper.recompute_boxes = True

Here's the model output during runtime:

GeneralizedRCNN( (backbone): FPN( (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (top_block): LastLevelMaxPool() (bottom_up): ResNet( (stem): BasicStem( (conv1): Conv2d( 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) ) (res2): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv1): Conv2d( 64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv2): Conv2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05) ) (conv3): Conv2d( 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) ) ) (res3): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv1): Conv2d( 256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv2): Conv2d( 128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05) ) (conv3): Conv2d( 128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) ) ) (res4): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) (conv1): Conv2d( 512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (3): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (4): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) (5): BottleneckBlock( (conv1): Conv2d( 1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv2): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05) ) (conv3): Conv2d( 256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05) ) ) ) (res5): Sequential( (0): BottleneckBlock( (shortcut): Conv2d( 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) (conv1): Conv2d( 1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (1): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) (2): BottleneckBlock( (conv1): Conv2d( 2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv2): Conv2d( 512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05) ) (conv3): Conv2d( 512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05) ) ) ) ) ) (proposal_generator): RPN( (rpn_head): StandardRPNHead( (conv): Conv2d( 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1) (activation): ReLU() ) (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1)) (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1)) ) (anchor_generator): DefaultAnchorGenerator( (cell_anchors): BufferList() ) ) (roi_heads): StandardROIHeads( (box_pooler): ROIPooler( (level_poolers): ModuleList( (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True) (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True) (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True) (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True) ) ) (box_head): FastRCNNConvFCHead( (flatten): Flatten(start_dim=1, end_dim=-1) (fc1): Linear(in_features=12544, out_features=1024, bias=True) (fc_relu1): ReLU() (fc2): Linear(in_features=1024, out_features=1024, bias=True) (fc_relu2): ReLU() ) (box_predictor): FastRCNNOutputLayers( (cls_score): Linear(in_features=1024, out_features=8, bias=True) (bbox_pred): Linear(in_features=1024, out_features=28, bias=True) ) ) )

2. What exact command you run:

python lazyconfig_train_net.py --config-file config.py

3. __Full logs__ or other relevant observations:

Traceback (most recent call last): File "/home/ubuntu/Trainer/detectron2/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/home/ubuntu/Trainer/detectron2/detectron2/engine/train_loop.py", line 404, in run_step data = next(self._data_loader_iter) File "/home/ubuntu/Trainer/detectron2/detectron2/data/common.py", line 234, in iter for d in self.dataset: File "/home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch data.append(next(self.dataset_iter)) File "/home/ubuntu/Trainer/detectron2/detectron2/data/common.py", line 201, in iter yield self.dataset[idx] File "/home/ubuntu/Trainer/detectron2/detectron2/data/common.py", line 90, in getitem data = self._map_func(self._dataset[cur_idx]) File "/home/ubuntu/Trainer/detectron2/detectron2/utils/serialize.py", line 26, in call return self._obj(*args, **kwargs) File "/home/ubuntu/Trainer/detectron2/detectron2/data/dataset_mapper.py", line 189, in call self._transform_annotations(dataset_dict, transforms, image_shape) File "/home/ubuntu/Trainer/detectron2/detectron2/data/dataset_mapper.py", line 141, in _transform_annotations instances.gt_boxes = instances.gt_masks.get_bounding_boxes() File "/home/ubuntu/Trainer/detectron2/detectron2/structures/instances.py", line 68, in getattr raise AttributeError("Cannot find field '{}' in the given Instances!".format(name)) AttributeError: Cannot find field 'gt_masks' in the given Instances!

## Expected behavior:

The model should start training without issue. I referred to #485 , but I'm using a detection model with bbox annotations. 
Not sure what is going on. The model weights from "detectron2://COCO-Detection/faster_rcnn_R_50_FPN_3x/137849458/model_final_280758.pkl" load fine, too.
A sample of my dataset:

{'file_name': '1713252303077.jpg', 'image_id': 71, 'height': 3000, 'width': 4000, 'annotations': [{'bbox': [61, 820, 3982, 2080], 'bbox_mode': <BoxMode.XYXY_ABS: 0>, 'category_id': 3}]}

## Environment:

The detectron2 is locally built from a fork without any changes.

sys.platform linux Python 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0] numpy 1.24.4 detectron2 0.6 @/home/ubuntu/Trainer/detectron2/detectron2 Compiler GCC 11.4 CUDA compiler not available DETECTRON2_ENV_MODULE PyTorch 1.8.2+cu102 @/home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torch PyTorch debug build False GPU available Yes GPU 0 Tesla T4 (arch=7.5) Driver version 535.171.04 CUDA_HOME None - invalid! Pillow 10.3.0 torchvision 0.9.2+cu102 @/home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torchvision torchvision arch flags /home/ubuntu/miniconda3/envs/detectron/lib/python3.8/site-packages/torchvision/_C.so fvcore 0.1.5.post20221221 iopath 0.1.9 cv2 4.9.0

PyTorch built with:

stoic-signs commented 1 month ago

Okay, looks like I found the issue: recompute_boxes requires gt_masks to get tighter bounding boxes when cropping. I removed that and it seems to work fine now. Follow-up question: when using the ResizeShortestEdge augmentation, I assume the bounding boxes are automatically scaled. Is this true? And is there any way to use cropping and recomputing bboxes when gt_mask isn't available?