facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.61k stars 2.46k forks source link

custom training asserts with "degenerate bboxes" over and over - but bboxes look correct, any debugging insight? #28

Closed lessw2020 closed 4 years ago

lessw2020 commented 4 years ago

I'm trying to get my custom dataset working but I can't get past 8 or so images via get_item and it keeps asserting that my bboxes are bad..I pull that one, it flags the next one, I pull that one, it flags the next...

From reading the code it wants to check that x1 and y1 are larger than x0 and y0 which is a great check.

55   assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

But it keeps flagging images that when I unwind from coco format should be fine...thus any insights? I was not able to print the boxes1 (200,4) and boxes 2 (12,4) tensors for some reason so I couldn't see into what it was actually calculating for the results (threw an odd gpu issue with 'formatting').

Example it flagged this image as being bad - here's the JSON for it in coco format, 6 classes. 1 box will surround all the other 5 objects btw as it's a malaria reader, so not sure if that box encompassing other boxes is really the issue?):

{"id": "c33c3539-8bd1-48e0-8065-831709e5e64d", "image_id": 3091210, "category_id": 2905442, "segmentation": null, "area": 0, "bbox": **[499, 121, 177, 80]**, "iscrowd": 0}, 
{"id": "0023d71e-e1e9-4862-a0b8-6e2bc3982b3b", "image_id": 3091210, "category_id": 2905422, "segmentation": null, "area": 0, "bbox": **[492, 523, 187, 163]**, "iscrowd": 0},
 {"id": "726fdfbc-3801-409d-ab75-ccf951e74316", "image_id": 3091210, "category_id": 2905421, "segmentation": null, "area": 0, "bbox": **[496, 428, 181, 93],** "iscrowd": 0}, 
{"id": "2bf85a8e-108d-4875-b0f5-47c8e5cb13e0", "image_id": 3091210, "category_id": 2905420, "segmentation": null, "area": 0, "bbox": **[494, 272, 186, 169]**, "iscrowd": 0},
 {"id": "8669c13a-1205-4e94-a645-18e2ffa491d0", "image_id": 3091210, "category_id": 2905419, "segmentation": null, "area": 0, "bbox": **[489, 127, 193, 557]**, "iscrowd": 0},
 {"id": "d9619859-e0ef-4632-ad51-7237a5760a5e", "image_id": 3091210, "category_id": 2905418, "segmentation": null, "area": 0, "bbox": **[495, 203, 182, 73]**, "iscrowd": 0},

And as a check for me, here's coco format: The COCO bounding box format is [top left x position, top left y position, width, height].

All the bboxes which it flags, are positive numbers for width and height, so the x1 and y1 must be larger than x0 and y0 - only a negative number added to the original x0 or y0 could result in it being smaller...so I'm unclear what it is asserting on or for.

But it asserts here:

~/detr/util/box_ops.py in generalized_box_iou(boxes1, boxes2)
     53     #print(boxes1)
     54     #print(boxes2)
---> 55     assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
     56     assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
     57     iou, union = box_iou(boxes1, boxes2)

I've removed 15+ images trying to get it to actually train but just keeps flagging more and more as invalid bboxes. I remove one image, then it asserts on the next one...and in reviewing the ones it flags vs the ones it lets pass, I don't see any real difference. (I have trained with this same dataset on EffficientDet so I know the dataset is reasonable).

Thus any help into debugging, or what might be awry would be appreciated. Thanks!

lessw2020 commented 4 years ago

*I'll try a different dataset tomorrow that doesn't have the one outer bounding box surrounding all the inner objects and see if that is the core issue.

alcinos commented 4 years ago

Hi, I'd have to see the full backtrace to be 100% sure, but generally in this function boxes1 correspond to the predicted boxes, not the target ones, so your dataset is likely not to blame here (see eg. https://github.com/facebookresearch/detr/blob/master/models/detr.py#L150) From the top of my head, I can think of mainly two things that can trigger this:

Best of luck.

lessw2020 commented 4 years ago

Hi @alcinos - thanks for saving my stress levels - I was poring over the bboxes trying to figure out how it was flagging them as incorrect.

You are right though, now I see from your link that it is the predicted boxes and not the dataset loaded ones.
Re: questions: 1 - I didn't change the LR nor the clamping params. (I'm trying to make as few adjustments and just get it training first).
2 - However, I think maybe the issue is I forgot there is no adjustment for classes in the main.py script (I had first adjusted the num_queries and then resest it to 100 when started hitting this issue). I'm training for 6 classes (or 6 +1 for background) and I realize now it is likely predicting for 70+...so that may be why it is quickly asserting after just a few batches, and NaN's for the degenerate predicted boxes?

Let me try to remap the class count and create a --num_classes param and see if that fixes this!

lessw2020 commented 4 years ago

ugh well no luck - I changed the classes to 6 +1. Depending on the number of queries, I get various failures in the loss matching via CUDA assert ala below. Running with 100 (default) I go right back to the degenerate bbox issue as before. --I'm running in Juypter with this launch:

%run main.py --batch_size 2 --no_aux_loss --coco_path uw-dev7
Here's the error with queries = 9 (the --> arrows are my debugging prints so I can verify the model being created has expected num_queries and classes): ``` Not using distributed mode git: sha: 7613beb10a530ca0ab836f2c8845d0501f5bf063, status: has uncommited changes, branch: master Namespace(aux_loss=False, backbone='resnet50', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='uw-dev7', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_url='env://', distributed=False, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=300, eval=False, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, lr=0.0001, lr_backbone=1e-05, lr_drop=200, mask_loss_coef=1, masks=False, nheads=8, num_queries=100, num_workers=2, output_dir='', position_embedding='sine', pre_norm=False, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=1) _____------> num_classes = 91 ___====> self.class_embed = Linear(in_features=256, out_features=7, bias=True) ___=====> self.query_embed = Embedding(9, 256) number of params: 41257227 loading annotations into memory... Done (t=0.02s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! Start training --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) ~/detr/main.py in 246 if args.output_dir: 247 Path(args.output_dir).mkdir(parents=True, exist_ok=True) --> 248 main(args) ~/detr/main.py in main(args) 196 train_stats = train_one_epoch( 197 model, criterion, data_loader_train, optimizer, device, epoch, --> 198 args.clip_max_norm) 199 lr_scheduler.step() 200 if args.output_dir: ~/detr/engine.py in train_one_epoch(model, criterion, data_loader, optimizer, device, epoch, max_norm) 31 32 outputs = model(samples) ---> 33 loss_dict = criterion(outputs, targets) 34 weight_dict = criterion.weight_dict 35 losses = sum(loss_dict[k] * weight_dict[k] for k in loss_dict.keys() if k in weight_dict) ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 530 result = self._slow_forward(*input, **kwargs) 531 else: --> 532 result = self.forward(*input, **kwargs) 533 for hook in self._forward_hooks.values(): 534 hook_result = hook(self, input, result) ~/detr/models/detr.py in forward(self, outputs, targets) 220 221 # Retrieve the matching between the outputs of the last layer and the targets --> 222 indices = self.matcher(outputs_without_aux, targets) 223 224 # Compute the average number of target boxes accross all nodes, for normalization purposes ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 530 result = self._slow_forward(*input, **kwargs) 531 else: --> 532 result = self.forward(*input, **kwargs) 533 for hook in self._forward_hooks.values(): 534 hook_result = hook(self, input, result) ~/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_no_grad(*args, **kwargs) 47 def decorate_no_grad(*args, **kwargs): 48 with self: ---> 49 return func(*args, **kwargs) 50 return decorate_no_grad 51 ~/detr/models/matcher.py in forward(self, outputs, targets) 72 73 # Compute the giou cost betwen boxes ---> 74 cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox)) 75 76 # Final cost matrix ~/detr/util/box_ops.py in box_cxcywh_to_xyxy(x) 9 def box_cxcywh_to_xyxy(x): 10 x_c, y_c, w, h = x.unbind(-1) ---> 11 b = [(x_c - 0.5 * w), (y_c - 0.5 * h), 12 (x_c + 0.5 * w), (y_c + 0.5 * h)] 13 return torch.stack(b, dim=-1) RuntimeError: CUDA error: device-side assert triggered ```
And reverting to queries=100, I get back into the original degenerate bbox issue as before: ``` Not using distributed mode git: sha: 7613beb10a530ca0ab836f2c8845d0501f5bf063, status: has uncommited changes, branch: master Namespace(aux_loss=False, backbone='resnet50', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='uw-dev7', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_url='env://', distributed=False, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=300, eval=False, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, lr=0.0001, lr_backbone=1e-05, lr_drop=200, mask_loss_coef=1, masks=False, nheads=8, num_queries=100, num_workers=2, output_dir='', position_embedding='sine', pre_norm=False, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=1) _____------> num_classes = 91 ___====> self.class_embed = Linear(in_features=256, out_features=7, bias=True) ___=====> self.query_embed = Embedding(100, 256) number of params: 41280523 loading annotations into memory... Done (t=0.02s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! Start training --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) ~/detr/main.py in 246 if args.output_dir: 247 Path(args.output_dir).mkdir(parents=True, exist_ok=True) --> 248 main(args) ~/detr/main.py in main(args) 196 train_stats = train_one_epoch( 197 model, criterion, data_loader_train, optimizer, device, epoch, --> 198 args.clip_max_norm) 199 lr_scheduler.step() 200 if args.output_dir: ~/detr/engine.py in train_one_epoch(model, criterion, data_loader, optimizer, device, epoch, max_norm) 31 32 outputs = model(samples) ---> 33 loss_dict = criterion(outputs, targets) 34 weight_dict = criterion.weight_dict 35 losses = sum(loss_dict[k] * weight_dict[k] for k in loss_dict.keys() if k in weight_dict) ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 530 result = self._slow_forward(*input, **kwargs) 531 else: --> 532 result = self.forward(*input, **kwargs) 533 for hook in self._forward_hooks.values(): 534 hook_result = hook(self, input, result) ~/detr/models/detr.py in forward(self, outputs, targets) 220 221 # Retrieve the matching between the outputs of the last layer and the targets --> 222 indices = self.matcher(outputs_without_aux, targets) 223 224 # Compute the average number of target boxes accross all nodes, for normalization purposes ~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs) 530 result = self._slow_forward(*input, **kwargs) 531 else: --> 532 result = self.forward(*input, **kwargs) 533 for hook in self._forward_hooks.values(): 534 hook_result = hook(self, input, result) ~/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_no_grad(*args, **kwargs) 47 def decorate_no_grad(*args, **kwargs): 48 with self: ---> 49 return func(*args, **kwargs) 50 return decorate_no_grad 51 ~/detr/models/matcher.py in forward(self, outputs, targets) 72 73 # Compute the giou cost betwen boxes ---> 74 cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox)) 75 76 # Final cost matrix ~/detr/util/box_ops.py in generalized_box_iou(boxes1, boxes2) 49 # degenerate boxes gives inf / nan results 50 # so do an early check ---> 51 assert (boxes1[:, 2:] >= boxes1[:, :2]).all() 52 assert (boxes2[:, 2:] >= boxes2[:, :2]).all() 53 iou, union = box_iou(boxes1, boxes2) ~/anaconda3/lib/python3.7/site-packages/torch/tensor.py in wrapped(*args, **kwargs) 26 def wrapped(*args, **kwargs): 27 try: ---> 28 return f(*args, **kwargs) 29 except TypeError: 30 return NotImplemented RuntimeError: CUDA error: device-side assert triggered ```
lessw2020 commented 4 years ago

*note - I'll try fine tuning tomorrow as a backup plan (via --resume and the checkpoint linear layer restart).

fmassa commented 4 years ago

@lessw2020

Let's break this down in two: The device-side assert and the degenerate boxes.

Device-side assert

I think in order to properly debug the RuntimeError: CUDA error: device-side assert triggered you'll need to run your script with CUDA_LAUNCH_BLOCKING=1 python main.py, due to the asynchronous nature of the CUDA calls in PyTorch. But as a rule of thumb, this generally comes from indexing a tensor out of bonds, for example when the number of outputs in the classifier is is smaller than the number of classes, which gets triggered at CrossEntropy.

I see from your logs though that you changed num_queries in the code to 9, but the argparse results are not changed (it still prints 100) -- can you try instead changing it in the command-line? There might be other places in the code that you forgot to change 100 to 9.

Degenerate boxes

The second (full) log that you posted seems to also indicate that you have a device-side assert being triggered, even if it seems to be pointing to the "degenerate" boxes. I think this shows that the "degenerate boxes" is a red-hearing, and the error lies elsewhere. My first guess: make sure that, if you changed num_classes in the code, you are using the same num_classes for the SetCriterion in https://github.com/facebookresearch/detr/blob/7613beb10a530ca0ab836f2c8845d0501f5bf063/models/detr.py#L330

This could explain why you are having the device-side asserts, as we use the num_classes from the Criterion to perform indexing https://github.com/facebookresearch/detr/blob/7613beb10a530ca0ab836f2c8845d0501f5bf063/models/detr.py#L108-L112

alcinos commented 4 years ago

To complete what @fmassa said, the canonical place to change the number of queries is the command line arg --num_queries (you shouldn't have to change the code for that). For the number of classes, you have only one line to edit here: https://github.com/facebookresearch/detr/blob/master/models/detr.py#L296-L298

lessw2020 commented 4 years ago

Hi @alcinos and @fmassa - thanks very much to both of you for the detailed info!

I've reset my code changes, updating the classes per the above and verifying SetCriterion, and updating the queries via command line arg. Unfortunately the problem persists - back to the degenerate bbox assert.

I'll try reverting to 100 for default query, and continue trying to pin it down further. For reference, I can run Coco eval on this server with 42 mAP, so config seems functional.

Here's my current results - I added a --num_classes arg to simplify (which adjusts at spot @alcinos pointed out), I have a print check for SetCriterion per @fmassa as well. I've printed the model, postprocessor, criterion in the results below as well as verified the class_embed looks correct ala num_classes+1: (class_embed): Linear(in_features=256, out_features=7, bias=True)

Here's my launch command: %run main.py --batch_size 2 --no_aux_loss --num_queries 12 --num_classes 6 --coco_path uw-dev7 --dataset_file coco --output_dir ./output

And results

Not using distributed mode
git: sha: 7613beb10a530ca0ab836f2c8845d0501f5bf063, status: has uncommited changes, branch: master
Namespace(aux_loss=False, backbone='resnet50', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='uw-dev7', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_url='env://', distributed=False, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=300, eval=False, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, lr=0.0001, lr_backbone=1e-05, lr_drop=200, mask_loss_coef=1, masks=False, nheads=8, num_classes=6, num_queries=12, num_workers=2, output_dir='./output', position_embedding='sine', pre_norm=False, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=1)
num_classes = 6
*** custom classes and queries ****
---> num classes = 6, num queries = 12
detr.py::SetCriterion.__init__ self.num_classes = 6
DETR(
  (transformer): Transformer(
    (encoder): TransformerEncoder(
      (layers): ModuleList(
        (0): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
        )
        (1): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
        )
        (2): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
        )
        (3): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
        )
        (4): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
        )
        (5): TransformerEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (decoder): TransformerDecoder(
      (layers): ModuleList(
        (0): TransformerDecoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (multihead_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
          (dropout3): Dropout(p=0.1, inplace=False)
        )
        (1): TransformerDecoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (multihead_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
          (dropout3): Dropout(p=0.1, inplace=False)
        )
        (2): TransformerDecoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (multihead_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
          (dropout3): Dropout(p=0.1, inplace=False)
        )
        (3): TransformerDecoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (multihead_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
          (dropout3): Dropout(p=0.1, inplace=False)
        )
        (4): TransformerDecoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (multihead_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
          (dropout3): Dropout(p=0.1, inplace=False)
        )
        (5): TransformerDecoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (multihead_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=True)
          )
          (linear1): Linear(in_features=256, out_features=2048, bias=True)
          (dropout): Dropout(p=0.1, inplace=False)
          (linear2): Linear(in_features=2048, out_features=256, bias=True)
          (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (dropout2): Dropout(p=0.1, inplace=False)
          (dropout3): Dropout(p=0.1, inplace=False)
        )
      )
      (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
    )
  )
  (class_embed): Linear(in_features=256, out_features=7, bias=True)
  (bbox_embed): MLP(
    (layers): ModuleList(
      (0): Linear(in_features=256, out_features=256, bias=True)
      (1): Linear(in_features=256, out_features=256, bias=True)
      (2): Linear(in_features=256, out_features=4, bias=True)
    )
  )
  (query_embed): Embedding(12, 256)
  (input_proj): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
  (backbone): Joiner(
    (0): Backbone(
      (body): IntermediateLayerGetter(
        (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
        (bn1): FrozenBatchNorm2d()
        (relu): ReLU(inplace=True)
        (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
        (layer1): Sequential(
          (0): Bottleneck(
            (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
            (downsample): Sequential(
              (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (1): FrozenBatchNorm2d()
            )
          )
          (1): Bottleneck(
            (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
          (2): Bottleneck(
            (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
        )
        (layer2): Sequential(
          (0): Bottleneck(
            (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
            (downsample): Sequential(
              (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
              (1): FrozenBatchNorm2d()
            )
          )
          (1): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
          (2): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
          (3): Bottleneck(
            (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
        )
        (layer3): Sequential(
          (0): Bottleneck(
            (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
            (downsample): Sequential(
              (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
              (1): FrozenBatchNorm2d()
            )
          )
          (1): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
          (2): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
          (3): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
          (4): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
          (5): Bottleneck(
            (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
        )
        (layer4): Sequential(
          (0): Bottleneck(
            (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
            (downsample): Sequential(
              (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
              (1): FrozenBatchNorm2d()
            )
          )
          (1): Bottleneck(
            (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
          (2): Bottleneck(
            (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn1): FrozenBatchNorm2d()
            (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn2): FrozenBatchNorm2d()
            (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn3): FrozenBatchNorm2d()
            (relu): ReLU(inplace=True)
          )
        )
      )
    )
    (1): PositionEmbeddingSine()
  )
)
SetCriterion(
  (matcher): HungarianMatcher()
)
{'bbox': PostProcess()}
number of params: 41257995
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Start training
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/detr/main.py in <module>
    252     if args.output_dir:
    253         Path(args.output_dir).mkdir(parents=True, exist_ok=True)
--> 254     main(args)

~/detr/main.py in main(args)
    202         train_stats = train_one_epoch(
    203             model, criterion, data_loader_train, optimizer, device, epoch,
--> 204             args.clip_max_norm)
    205         lr_scheduler.step()
    206         if args.output_dir:

~/detr/engine.py in train_one_epoch(model, criterion, data_loader, optimizer, device, epoch, max_norm)
     31 
     32         outputs = model(samples)
---> 33         loss_dict = criterion(outputs, targets)
     34         weight_dict = criterion.weight_dict
     35         losses = sum(loss_dict[k] * weight_dict[k] for k in loss_dict.keys() if k in weight_dict)

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/detr/models/detr.py in forward(self, outputs, targets)
    217 
    218         # Retrieve the matching between the outputs of the last layer and the targets
--> 219         indices = self.matcher(outputs_without_aux, targets)
    220 
    221         # Compute the average number of target boxes accross all nodes, for normalization purposes

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

~/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_no_grad(*args, **kwargs)
     47         def decorate_no_grad(*args, **kwargs):
     48             with self:
---> 49                 return func(*args, **kwargs)
     50         return decorate_no_grad
     51 

~/detr/models/matcher.py in forward(self, outputs, targets)
     72 
     73         # Compute the giou cost betwen boxes
---> 74         cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
     75 
     76         # Final cost matrix

~/detr/util/box_ops.py in generalized_box_iou(boxes1, boxes2)
     49     # degenerate boxes gives inf / nan results
     50     # so do an early check
---> 51     assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
     52     assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
     53     iou, union = box_iou(boxes1, boxes2)

RuntimeError: CUDA error: device-side assert triggered
fmassa commented 4 years ago

@lessw2020 next step is to run the code with CUDA_LAUNCH_BLOCKING=1 python main.py, as it will show exactly where the issue is -- the current error in the assert is a red-herring because it's the first point in the code that has a sync point (due to the assertion requiring a value on the CPU).

I still think that the most likely culprit should be in the Criterion. Also, can you paste the rest of the error message that is displayed? The device-side assert from CUDA generally prints a lot of repeated messages, but which indicate in which kernel the assert happened, which is helpful for debugginng

lessw2020 commented 4 years ago

Hi @fmassa - thanks again for the help. I switched out from Jupyter to terminal and then was able to see the full CUDA assert info (about 32 of them). I'm attaching debug.txt which is the std output (model loading, starting train) and more importantly, debug2.txt which contains the specific CUDA asserts generated from using the CUDA_LAUNCH_BLOCKING=1. Thanks for the help on this and hope this additional CUDA info helps pin it down! debug2.txt debug.txt

lessw2020 commented 4 years ago

I should add your intuition was quite correct as the core CUDA issue is an index out of bounds: /opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda [](int)->auto::operator()(int)->auto: block: [0,0,0], thread: [95,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed.

fmassa commented 4 years ago

@lessw2020 from debug2.txt, the error comes from

cost_class = -out_prob[:, tgt_ids] 

which indicates that your your class probability has fewer elements than the ground-truth indices.

If you add a print(tgt_ids.max()) in your code, you'll see that it is larger than 6, which means that there might be an issue with your dataset (as you have more classes than you thought). I believe this is probably the issue that you are facing.

As an unrelated note, I noted that you are passing --no_aux_loss to the model -- note that our best results are obtained with aux_loss. The evaluation code doesn't need aux loss because it's just evaluation and it is slightly faster, but for training in general it's better to use aux_loss.

lessw2020 commented 4 years ago

Hi @fmassa - thanks for the updates! (and really appreciate you helping trace my issue through).

1 - appreciate the tip re: --no_aux_loss, will drop that as we definitely want best results here (this is for malaria and covid diagnostics, so accuracy is paramount).

2 - Re: more classes than expected - never say never, but overall I don't believe that is possible (double checks shown below). I did do the print statement and that may be helpful - it shows 11 targets instead of 12? There are only 6 unique class ids in that target tensor though:

`--> HungarianMatcher::tgt_ids.max() = 2905442 and tgt_ids tensor([2905419, 2905420, 2905442, 2905422, 2905421, 2905419, 2905442, 2905422, 2905421, 2905420, 2905418], device='cuda:0')

2905418 2905419 2905420 2905421 2905422 2905442 = 6 unique targets`

I've double checked at the source labeling process and the master label process only knows about 6 classes (see attached)... I've also trained this same dataset in EfficientDet and no issues in terms of aberrant classes.

malaria-classes

In addition, from the bbox prints I showed above you can see that the get_item is only returning six bboxes per image.

If it helps here is some additional print info on the tgt_ids from matcher - this tensor seems to match to expected in terms of 24 bboxes or 12 * 2, with 7 items:

--> HungarianMatcher::bs = 2, num_queries = 12

--> HungarianMatcher::out_prob = tensor([ [0.1429, 0.1524, 0.0360, 0.3441, 0.0595, 0.1435, 0.1216], [0.1272, 0.1756, 0.0668, 0.3013, 0.0830, 0.1012, 0.1450], [0.1289, 0.2092, 0.0606, 0.2196, 0.0773, 0.1041, 0.2003], [0.1707, 0.1649, 0.0612, 0.3110, 0.0605, 0.0686, 0.1631], [0.1422, 0.1706, 0.0585, 0.3027, 0.0632, 0.1221, 0.1407], [0.1525, 0.1247, 0.0755, 0.3634, 0.0449, 0.0984, 0.1406], [0.1743, 0.1759, 0.0831, 0.2489, 0.0527, 0.0978, 0.1673], [0.1781, 0.1521, 0.0790, 0.3080, 0.0660, 0.1194, 0.0975], [0.1112, 0.1441, 0.0579, 0.4215, 0.0434, 0.1162, 0.1056], [0.1850, 0.1077, 0.0532, 0.4358, 0.0441, 0.0603, 0.1138], [0.1347, 0.1865, 0.0617, 0.2846, 0.0529, 0.1071, 0.1725], [0.1269, 0.1257, 0.0557, 0.3342, 0.0883, 0.0871, 0.1820], [0.1542, 0.1182, 0.0392, 0.3187, 0.0728, 0.1875, 0.1094], [0.0895, 0.1262, 0.0507, 0.3688, 0.0578, 0.1258, 0.1812], [0.1555, 0.0875, 0.0241, 0.3864, 0.0619, 0.1670, 0.1175], [0.1452, 0.1161, 0.0651, 0.3215, 0.0478, 0.1602, 0.1442], [0.1515, 0.0955, 0.0361, 0.4037, 0.0529, 0.1357, 0.1246], [0.1484, 0.1239, 0.0483, 0.3102, 0.0846, 0.1653, 0.1193], [0.1504, 0.1253, 0.0385, 0.2777, 0.0588, 0.1858, 0.1635], [0.1305, 0.0958, 0.0304, 0.4392, 0.0672, 0.1001, 0.1368], [0.1302, 0.1838, 0.0467, 0.2408, 0.0975, 0.1746, 0.1265], [0.1314, 0.1010, 0.0646, 0.3697, 0.0854, 0.1541, 0.0939], [0.1565, 0.0833, 0.0494, 0.3646, 0.0443, 0.1281, 0.1738], [0.1233, 0.0938, 0.0386, 0.4156, 0.0898, 0.1282, 0.1107]], device='cuda:0')

alcinos commented 4 years ago

Hi @lessw2020 apologies for the confusion, the class IDs need to be remapped to [0, 6]. Basically you want tgt_ids.max() < num_classes

EDIT: to clarify, in our case for COCO there is 80 classes with labels in [0, 90], and for simplicity we don't do remapping so we use num_classes=91 (so that we satisfy the inequality above). It doesn't matter that some ids will never be used (it's a slight waste of parameters, but negligible in this case). In your case it won't work though, you really don't want to have a softmax over 2.9M elements, so remapping is the way to go.

lessw2020 commented 4 years ago

Hi @alcinos and @fmassa - Oh- thanks for clarifying this!
I see what you mean here - I incorrectly thought that mapping was being auto-handled via the inheritance from torchvision.datasets.CocoDetection (which I've never used before).... and then all the bbox asserts etc sent me down a round-about path.

Anyway, I learned some nice info on debugging CUDA asserts and thanks for all the help.

definitely agree re: softmax of over 2.9M :) - I'm putting together a custom dataset class for DETR based on what I used for EfficientDet while trying to keep close to your impl to do the class remapping and will confirm I'm training after that.

Thanks again!

lessw2020 commented 4 years ago

I've got it all remapped and working - will try to train tomorrow.
Because the CocoDetection class uses the separate class ConvertPolysToMasks, I figured the cleanest point to remap was right before returning the target since I need access to the self.coco which that class doesn't have and I have to wait for ConvertPolys to review and weed out any errant bboxes. I made a remap_labels function and do it in place. Github seems to be stripping up some of the code formatting, but any feedback is welcome if there's a better spot to remap etc.

` def getitem(self, idx): img, target = super().getitem(idx) image_id = self.ids[idx] target = {'image_id': image_id, 'annotations': target} img, target = self.prepare(img, target) if self._transforms is not None: img, target = self._transforms(img, target)

modify target['labels'] in place to my labels

    self.remap_labels(target)
    return img, target

def remap_labels(self,target):
    #print(target['labels'])
    ll = target['labels'].tolist()
    for i,item in enumerate(ll):
        new_id = self.coco_label_to_my_label(item)
        #print(f"item: {item} --> new_id {new_id}")
        ll[i]=new_id
    newclasses = torch.tensor(ll, dtype = torch.int64)
    #print(f"---> updated labels:  {newclasses}")
    target['labels']=newclasses

def coco_label_to_my_label(self, coco_label):
    return self.coco_labels_inverse[coco_label]

def my_label_to_coco_label(self, label):
    return self.coco_labels[label]

`

new_classes_detr

lessw2020 commented 4 years ago

Hi @alcinos and @fmassa - I'm up and training successfully now.
Just wanted to say thanks again for the help! I made a custom_coco.py and modded main.py and init.py in datasets to keep everything as closely aligned as I could while allowing custom_class counts and handling the remapping for future updates.
I can PR the custom_coco if that would be useful to others otherwise this issue is resolved and can close. Thanks again! training_detr

fmassa commented 4 years ago

@lessw2020 great that you managed to make it work!

We had in initial versions of the code a class to remap categories of COCO, but we removed it because it was not being used anymore and made things a bit more complicated for the evaluation, that you also need to pay attention to otherwise your mAP will be zero.

I think this is a good record to keep in mind and improve on the documentation, but I'm non sure what would be the best way to do it while keeping things as easy as possible -- it would need to involve adding a few more abstractions as the ones in torchvision to make it work, and I believe we would prefer to keep the codebase as simple as possible.

Maybe @szagoruyko or @alcinos can comment on this, but I think a note somewhere explaining how to do it would maybe be preferable.

lessw2020 commented 4 years ago

Hi @fmassa - completely understand about keeping the codebase as simple as possible.

I think just having some good documentation ideally with an example walk through for training a custom dataset would be more than sufficient b/c then best practices are distilled from the start to avoid various issues like this one in the first place, and perhaps including the tgt_ids.max() < num_classes as an assert in the matcher code (which is useful for all) should be plenty?

And yes I am dealing with mAP ==0 atm now that I can train :) Any tips on that appreciated and maybe that could be added as part of the documentation as well? If the documentation is open-sourced for user contributions, I'd be happy to contribute as I can, since I expect to be working intensively with DETR for RL medical application to replace EfficientDet. Regardless thanks again for all the help!

fmassa commented 4 years ago

and perhaps including the tgt_ids.max() < num_classes as an assert in the matcher code (which is useful for all) should be plenty?

Yes, I agree that this assert would be a good thing to have, although it will incur a small runtime penalty during training, but it should be fine I think.

Any tips on that appreciated and maybe that could be added as part of the documentation as well?

There is some information in https://github.com/facebookresearch/detr/issues/41

I think a new file named TROUBLESHOOTING.md (that has a link in the main README) could be the good place to have more information, in the format ->

Regardless thanks again for all the help!

Let us know if you have further questions!

Mashood3624 commented 2 years ago

Hi, I may be wrong but what I have understood is that @lessw2020 had some custom class labels of 2M+ values (like 29054191). By remapping or aliasing them to 0, 1, 2 and etc integers the error got resolved right?. As max class label must not increase the total number of classes for example if we have a total number of 4 classes in our dataset then our labels should be 0, 1, 2 and 3. I am facing the same assert error even though I have labelled my classes correctly. Please guide. Thanks.