facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.09k stars 2.37k forks source link

'class_labels': tensor([], dtype=torch.int64) when having negative examples during training. #530

Open rsong0606 opened 1 year ago

rsong0606 commented 1 year ago

❓ How to do something using DETR

I tried the recommended approach from this https://github.com/facebookresearch/detr/issues/82 to handle negative examples.

In my COCO format, I have id =1 as the positive category and id = 2 as the negative category


  "categories": [
        {
            "supercategory": "Defect",
            "id": 1,
            "name": "signature"
        },
        {
            "supercategory": "Defect",
            "id": 2,
            "name": "non-signature"
        }
    ]

Get target information from the code

class CocoDetection(torchvision.datasets.CocoDetection):
    def __init__(self, img_folder, feature_extractor, train=True):
        ann_file = os.path.join(img_folder, "custom_train.json" if train else "custom_val.json")
        super(CocoDetection, self).__init__(img_folder, ann_file)
        self.feature_extractor = feature_extractor

    def __getitem__(self, idx):
        # read in PIL image and target in COCO format
        img, target = super(CocoDetection, self).__getitem__(idx)

        # preprocess image and target (converting target to DETR format, resizing + normalization of both image and target)
        image_id = self.ids[idx]
        target = {'image_id': image_id, 'annotations': target}
        encoding = self.feature_extractor(images=img, annotations=target, return_tensors="pt")
        pixel_values = encoding["pixel_values"].squeeze() # remove batch dimension
        target = encoding["labels"][0] # remove batch dimension

        return pixel_values, target

print result: {'boxes': tensor([], size=(0, 4)), 'class_labels': tensor([], dtype=torch.int64), 'image_id': tensor([0]), 'area': tensor([]), 'iscrowd': tensor([], dtype=torch.int64), 'orig_size': tensor([618, 480]), 'size': tensor([1030, 800])}

My question is should class_labels be 2 instead of empty?