Closed lessw2020 closed 4 years ago
*I'll try a different dataset tomorrow that doesn't have the one outer bounding box surrounding all the inner objects and see if that is the core issue.
Hi,
I'd have to see the full backtrace to be 100% sure, but generally in this function boxes1
correspond to the predicted boxes, not the target ones, so your dataset is likely not to blame here (see eg. https://github.com/facebookresearch/detr/blob/master/models/detr.py#L150)
From the top of my head, I can think of mainly two things that can trigger this:
Best of luck.
Hi @alcinos - thanks for saving my stress levels - I was poring over the bboxes trying to figure out how it was flagging them as incorrect.
You are right though, now I see from your link that it is the predicted boxes and not the dataset loaded ones.
Re: questions:
1 - I didn't change the LR nor the clamping params. (I'm trying to make as few adjustments and just get it training first).
2 - However, I think maybe the issue is I forgot there is no adjustment for classes in the main.py script (I had first adjusted the num_queries and then resest it to 100 when started hitting this issue). I'm training for 6 classes (or 6 +1 for background) and I realize now it is likely predicting for 70+...so that may be why it is quickly asserting after just a few batches, and NaN's for the degenerate predicted boxes?
Let me try to remap the class count and create a --num_classes param and see if that fixes this!
ugh well no luck - I changed the classes to 6 +1. Depending on the number of queries, I get various failures in the loss matching via CUDA assert ala below. Running with 100 (default) I go right back to the degenerate bbox issue as before. --I'm running in Juypter with this launch:
%run main.py --batch_size 2 --no_aux_loss --coco_path uw-dev7
*note - I'll try fine tuning tomorrow as a backup plan (via --resume and the checkpoint linear layer restart).
@lessw2020
Let's break this down in two: The device-side assert and the degenerate boxes.
I think in order to properly debug the RuntimeError: CUDA error: device-side assert triggered
you'll need to run your script with CUDA_LAUNCH_BLOCKING=1 python main.py
, due to the asynchronous nature of the CUDA calls in PyTorch.
But as a rule of thumb, this generally comes from indexing a tensor out of bonds, for example when the number of outputs in the classifier is is smaller than the number of classes, which gets triggered at CrossEntropy
.
I see from your logs though that you changed num_queries
in the code to 9, but the argparse results are not changed (it still prints 100) -- can you try instead changing it in the command-line? There might be other places in the code that you forgot to change 100 to 9.
The second (full) log that you posted seems to also indicate that you have a device-side assert being triggered, even if it seems to be pointing to the "degenerate" boxes. I think this shows that the "degenerate boxes" is a red-hearing, and the error lies elsewhere.
My first guess: make sure that, if you changed num_classes
in the code, you are using the same num_classes
for the SetCriterion
in https://github.com/facebookresearch/detr/blob/7613beb10a530ca0ab836f2c8845d0501f5bf063/models/detr.py#L330
This could explain why you are having the device-side asserts, as we use the num_classes
from the Criterion to perform indexing https://github.com/facebookresearch/detr/blob/7613beb10a530ca0ab836f2c8845d0501f5bf063/models/detr.py#L108-L112
To complete what @fmassa said, the canonical place to change the number of queries is the command line arg --num_queries
(you shouldn't have to change the code for that). For the number of classes, you have only one line to edit here: https://github.com/facebookresearch/detr/blob/master/models/detr.py#L296-L298
Hi @alcinos and @fmassa - thanks very much to both of you for the detailed info!
I've reset my code changes, updating the classes per the above and verifying SetCriterion, and updating the queries via command line arg. Unfortunately the problem persists - back to the degenerate bbox assert.
I'll try reverting to 100 for default query, and continue trying to pin it down further. For reference, I can run Coco eval on this server with 42 mAP, so config seems functional.
Here's my current results - I added a --num_classes arg to simplify (which adjusts at spot @alcinos pointed out), I have a print check for SetCriterion per @fmassa as well. I've printed the model, postprocessor, criterion in the results below as well as verified the class_embed looks correct ala num_classes+1: (class_embed): Linear(in_features=256, out_features=7, bias=True)
Here's my launch command:
%run main.py --batch_size 2 --no_aux_loss --num_queries 12 --num_classes 6 --coco_path uw-dev7 --dataset_file coco --output_dir ./output
And results
Not using distributed mode
git: sha: 7613beb10a530ca0ab836f2c8845d0501f5bf063, status: has uncommited changes, branch: master
Namespace(aux_loss=False, backbone='resnet50', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='uw-dev7', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_url='env://', distributed=False, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=300, eval=False, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, lr=0.0001, lr_backbone=1e-05, lr_drop=200, mask_loss_coef=1, masks=False, nheads=8, num_classes=6, num_queries=12, num_workers=2, output_dir='./output', position_embedding='sine', pre_norm=False, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=1)
num_classes = 6
*** custom classes and queries ****
---> num classes = 6, num queries = 12
detr.py::SetCriterion.__init__ self.num_classes = 6
DETR(
(transformer): Transformer(
(encoder): TransformerEncoder(
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
)
)
)
(decoder): TransformerDecoder(
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(multihead_attn): MultiheadAttention(
(out_proj): Linear(in_features=256, out_features=256, bias=True)
)
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout1): Dropout(p=0.1, inplace=False)
(dropout2): Dropout(p=0.1, inplace=False)
(dropout3): Dropout(p=0.1, inplace=False)
)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(class_embed): Linear(in_features=256, out_features=7, bias=True)
(bbox_embed): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(query_embed): Embedding(12, 256)
(input_proj): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(backbone): Joiner(
(0): Backbone(
(body): IntermediateLayerGetter(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): FrozenBatchNorm2d()
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d()
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d()
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): FrozenBatchNorm2d()
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d()
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d()
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d()
(relu): ReLU(inplace=True)
)
)
)
)
(1): PositionEmbeddingSine()
)
)
SetCriterion(
(matcher): HungarianMatcher()
)
{'bbox': PostProcess()}
number of params: 41257995
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Start training
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
~/detr/main.py in <module>
252 if args.output_dir:
253 Path(args.output_dir).mkdir(parents=True, exist_ok=True)
--> 254 main(args)
~/detr/main.py in main(args)
202 train_stats = train_one_epoch(
203 model, criterion, data_loader_train, optimizer, device, epoch,
--> 204 args.clip_max_norm)
205 lr_scheduler.step()
206 if args.output_dir:
~/detr/engine.py in train_one_epoch(model, criterion, data_loader, optimizer, device, epoch, max_norm)
31
32 outputs = model(samples)
---> 33 loss_dict = criterion(outputs, targets)
34 weight_dict = criterion.weight_dict
35 losses = sum(loss_dict[k] * weight_dict[k] for k in loss_dict.keys() if k in weight_dict)
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
~/detr/models/detr.py in forward(self, outputs, targets)
217
218 # Retrieve the matching between the outputs of the last layer and the targets
--> 219 indices = self.matcher(outputs_without_aux, targets)
220
221 # Compute the average number of target boxes accross all nodes, for normalization purposes
~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
530 result = self._slow_forward(*input, **kwargs)
531 else:
--> 532 result = self.forward(*input, **kwargs)
533 for hook in self._forward_hooks.values():
534 hook_result = hook(self, input, result)
~/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py in decorate_no_grad(*args, **kwargs)
47 def decorate_no_grad(*args, **kwargs):
48 with self:
---> 49 return func(*args, **kwargs)
50 return decorate_no_grad
51
~/detr/models/matcher.py in forward(self, outputs, targets)
72
73 # Compute the giou cost betwen boxes
---> 74 cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
75
76 # Final cost matrix
~/detr/util/box_ops.py in generalized_box_iou(boxes1, boxes2)
49 # degenerate boxes gives inf / nan results
50 # so do an early check
---> 51 assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
52 assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
53 iou, union = box_iou(boxes1, boxes2)
RuntimeError: CUDA error: device-side assert triggered
@lessw2020 next step is to run the code with CUDA_LAUNCH_BLOCKING=1 python main.py
, as it will show exactly where the issue is -- the current error in the assert is a red-herring because it's the first point in the code that has a sync point (due to the assertion requiring a value on the CPU).
I still think that the most likely culprit should be in the Criterion. Also, can you paste the rest of the error message that is displayed? The device-side assert from CUDA generally prints a lot of repeated messages, but which indicate in which kernel the assert happened, which is helpful for debugginng
Hi @fmassa - thanks again for the help. I switched out from Jupyter to terminal and then was able to see the full CUDA assert info (about 32 of them). I'm attaching debug.txt which is the std output (model loading, starting train) and more importantly, debug2.txt which contains the specific CUDA asserts generated from using the CUDA_LAUNCH_BLOCKING=1. Thanks for the help on this and hope this additional CUDA info helps pin it down! debug2.txt debug.txt
I should add your intuition was quite correct as the core CUDA issue is an index out of bounds:
/opt/conda/conda-bld/pytorch_1579040055865/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda [](int)->auto::operator()(int)->auto: block: [0,0,0], thread: [95,0,0] Assertion
index >= -sizes[i] && index < sizes[i] && "index out of bounds"failed.
@lessw2020 from debug2.txt
, the error comes from
cost_class = -out_prob[:, tgt_ids]
which indicates that your your class probability has fewer elements than the ground-truth indices.
If you add a print(tgt_ids.max())
in your code, you'll see that it is larger than 6, which means that there might be an issue with your dataset (as you have more classes than you thought). I believe this is probably the issue that you are facing.
As an unrelated note, I noted that you are passing --no_aux_loss
to the model -- note that our best results are obtained with aux_loss. The evaluation code doesn't need aux loss because it's just evaluation and it is slightly faster, but for training in general it's better to use aux_loss
.
Hi @fmassa - thanks for the updates! (and really appreciate you helping trace my issue through).
1 - appreciate the tip re: --no_aux_loss, will drop that as we definitely want best results here (this is for malaria and covid diagnostics, so accuracy is paramount).
2 - Re: more classes than expected - never say never, but overall I don't believe that is possible (double checks shown below). I did do the print statement and that may be helpful - it shows 11 targets instead of 12? There are only 6 unique class ids in that target tensor though:
`--> HungarianMatcher::tgt_ids.max() = 2905442 and tgt_ids tensor([2905419, 2905420, 2905442, 2905422, 2905421, 2905419, 2905442, 2905422, 2905421, 2905420, 2905418], device='cuda:0')
2905418 2905419 2905420 2905421 2905422 2905442 = 6 unique targets`
I've double checked at the source labeling process and the master label process only knows about 6 classes (see attached)... I've also trained this same dataset in EfficientDet and no issues in terms of aberrant classes.
In addition, from the bbox prints I showed above you can see that the get_item is only returning six bboxes per image.
If it helps here is some additional print info on the tgt_ids from matcher - this tensor seems to match to expected in terms of 24 bboxes or 12 * 2, with 7 items:
--> HungarianMatcher::bs = 2, num_queries = 12
--> HungarianMatcher::out_prob = tensor([ [0.1429, 0.1524, 0.0360, 0.3441, 0.0595, 0.1435, 0.1216], [0.1272, 0.1756, 0.0668, 0.3013, 0.0830, 0.1012, 0.1450], [0.1289, 0.2092, 0.0606, 0.2196, 0.0773, 0.1041, 0.2003], [0.1707, 0.1649, 0.0612, 0.3110, 0.0605, 0.0686, 0.1631], [0.1422, 0.1706, 0.0585, 0.3027, 0.0632, 0.1221, 0.1407], [0.1525, 0.1247, 0.0755, 0.3634, 0.0449, 0.0984, 0.1406], [0.1743, 0.1759, 0.0831, 0.2489, 0.0527, 0.0978, 0.1673], [0.1781, 0.1521, 0.0790, 0.3080, 0.0660, 0.1194, 0.0975], [0.1112, 0.1441, 0.0579, 0.4215, 0.0434, 0.1162, 0.1056], [0.1850, 0.1077, 0.0532, 0.4358, 0.0441, 0.0603, 0.1138], [0.1347, 0.1865, 0.0617, 0.2846, 0.0529, 0.1071, 0.1725], [0.1269, 0.1257, 0.0557, 0.3342, 0.0883, 0.0871, 0.1820], [0.1542, 0.1182, 0.0392, 0.3187, 0.0728, 0.1875, 0.1094], [0.0895, 0.1262, 0.0507, 0.3688, 0.0578, 0.1258, 0.1812], [0.1555, 0.0875, 0.0241, 0.3864, 0.0619, 0.1670, 0.1175], [0.1452, 0.1161, 0.0651, 0.3215, 0.0478, 0.1602, 0.1442], [0.1515, 0.0955, 0.0361, 0.4037, 0.0529, 0.1357, 0.1246], [0.1484, 0.1239, 0.0483, 0.3102, 0.0846, 0.1653, 0.1193], [0.1504, 0.1253, 0.0385, 0.2777, 0.0588, 0.1858, 0.1635], [0.1305, 0.0958, 0.0304, 0.4392, 0.0672, 0.1001, 0.1368], [0.1302, 0.1838, 0.0467, 0.2408, 0.0975, 0.1746, 0.1265], [0.1314, 0.1010, 0.0646, 0.3697, 0.0854, 0.1541, 0.0939], [0.1565, 0.0833, 0.0494, 0.3646, 0.0443, 0.1281, 0.1738], [0.1233, 0.0938, 0.0386, 0.4156, 0.0898, 0.1282, 0.1107]], device='cuda:0')
Hi @lessw2020 apologies for the confusion, the class IDs need to be remapped to [0, 6]. Basically you want tgt_ids.max() < num_classes
EDIT: to clarify, in our case for COCO there is 80 classes with labels in [0, 90], and for simplicity we don't do remapping so we use num_classes=91 (so that we satisfy the inequality above). It doesn't matter that some ids will never be used (it's a slight waste of parameters, but negligible in this case). In your case it won't work though, you really don't want to have a softmax over 2.9M elements, so remapping is the way to go.
Hi @alcinos and @fmassa -
Oh- thanks for clarifying this!
I see what you mean here - I incorrectly thought that mapping was being auto-handled via the inheritance from torchvision.datasets.CocoDetection (which I've never used before).... and then all the bbox asserts etc sent me down a round-about path.
Anyway, I learned some nice info on debugging CUDA asserts and thanks for all the help.
definitely agree re: softmax of over 2.9M :) - I'm putting together a custom dataset class for DETR based on what I used for EfficientDet while trying to keep close to your impl to do the class remapping and will confirm I'm training after that.
Thanks again!
I've got it all remapped and working - will try to train tomorrow.
Because the CocoDetection class uses the separate class ConvertPolysToMasks, I figured the cleanest point to remap was right before returning the target since I need access to the self.coco which that class doesn't have and I have to wait for ConvertPolys to review and weed out any errant bboxes. I made a remap_labels function and do it in place.
Github seems to be stripping up some of the code formatting, but any feedback is welcome if there's a better spot to remap etc.
` def getitem(self, idx): img, target = super().getitem(idx) image_id = self.ids[idx] target = {'image_id': image_id, 'annotations': target} img, target = self.prepare(img, target) if self._transforms is not None: img, target = self._transforms(img, target)
self.remap_labels(target)
return img, target
def remap_labels(self,target):
#print(target['labels'])
ll = target['labels'].tolist()
for i,item in enumerate(ll):
new_id = self.coco_label_to_my_label(item)
#print(f"item: {item} --> new_id {new_id}")
ll[i]=new_id
newclasses = torch.tensor(ll, dtype = torch.int64)
#print(f"---> updated labels: {newclasses}")
target['labels']=newclasses
def coco_label_to_my_label(self, coco_label):
return self.coco_labels_inverse[coco_label]
def my_label_to_coco_label(self, label):
return self.coco_labels[label]
`
Hi @alcinos and @fmassa - I'm up and training successfully now.
Just wanted to say thanks again for the help!
I made a custom_coco.py and modded main.py and init.py in datasets to keep everything as closely aligned as I could while allowing custom_class counts and handling the remapping for future updates.
I can PR the custom_coco if that would be useful to others otherwise this issue is resolved and can close.
Thanks again!
@lessw2020 great that you managed to make it work!
We had in initial versions of the code a class to remap categories of COCO, but we removed it because it was not being used anymore and made things a bit more complicated for the evaluation, that you also need to pay attention to otherwise your mAP will be zero.
I think this is a good record to keep in mind and improve on the documentation, but I'm non sure what would be the best way to do it while keeping things as easy as possible -- it would need to involve adding a few more abstractions as the ones in torchvision to make it work, and I believe we would prefer to keep the codebase as simple as possible.
Maybe @szagoruyko or @alcinos can comment on this, but I think a note somewhere explaining how to do it would maybe be preferable.
Hi @fmassa - completely understand about keeping the codebase as simple as possible.
I think just having some good documentation ideally with an example walk through for training a custom dataset would be more than sufficient b/c then best practices are distilled from the start to avoid various issues like this one in the first place, and perhaps including the tgt_ids.max() < num_classes
as an assert in the matcher code (which is useful for all) should be plenty?
And yes I am dealing with mAP ==0 atm now that I can train :) Any tips on that appreciated and maybe that could be added as part of the documentation as well? If the documentation is open-sourced for user contributions, I'd be happy to contribute as I can, since I expect to be working intensively with DETR for RL medical application to replace EfficientDet. Regardless thanks again for all the help!
and perhaps including the tgt_ids.max() < num_classes as an assert in the matcher code (which is useful for all) should be plenty?
Yes, I agree that this assert would be a good thing to have, although it will incur a small runtime penalty during training, but it should be fine I think.
Any tips on that appreciated and maybe that could be added as part of the documentation as well?
There is some information in https://github.com/facebookresearch/detr/issues/41
I think a new file named TROUBLESHOOTING.md (that has a link in the main README) could be the good place to have more information, in the format
Regardless thanks again for all the help!
Let us know if you have further questions!
Hi, I may be wrong but what I have understood is that @lessw2020 had some custom class labels of 2M+ values (like 29054191). By remapping or aliasing them to 0, 1, 2 and etc integers the error got resolved right?. As max class label must not increase the total number of classes for example if we have a total number of 4 classes in our dataset then our labels should be 0, 1, 2 and 3. I am facing the same assert error even though I have labelled my classes correctly. Please guide. Thanks.
I'm trying to get my custom dataset working but I can't get past 8 or so images via get_item and it keeps asserting that my bboxes are bad..I pull that one, it flags the next one, I pull that one, it flags the next...
From reading the code it wants to check that x1 and y1 are larger than x0 and y0 which is a great check.
But it keeps flagging images that when I unwind from coco format should be fine...thus any insights? I was not able to print the boxes1 (200,4) and boxes 2 (12,4) tensors for some reason so I couldn't see into what it was actually calculating for the results (threw an odd gpu issue with 'formatting').
Example it flagged this image as being bad - here's the JSON for it in coco format, 6 classes. 1 box will surround all the other 5 objects btw as it's a malaria reader, so not sure if that box encompassing other boxes is really the issue?):
And as a check for me, here's coco format: The COCO bounding box format is [top left x position, top left y position, width, height].
All the bboxes which it flags, are positive numbers for width and height, so the x1 and y1 must be larger than x0 and y0 - only a negative number added to the original x0 or y0 could result in it being smaller...so I'm unclear what it is asserting on or for.
But it asserts here:
I've removed 15+ images trying to get it to actually train but just keeps flagging more and more as invalid bboxes. I remove one image, then it asserts on the next one...and in reviewing the ones it flags vs the ones it lets pass, I don't see any real difference. (I have trained with this same dataset on EffficientDet so I know the dataset is reasonable).
Thus any help into debugging, or what might be awry would be appreciated. Thanks!