Closed HanWangSJTU closed 2 years ago
Accumulating evaluation results... DONE (t=8.49s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.001 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.008 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.010 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.010 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.017
I fix it by drop the lr.
Why does this happen?
I met the same question in training my datasets, i found the class_error is alway 100 when traing, i think maybe it is the reason why ap = 0, but i dont kown why this happen.
@HanWangSJTU Hi, would you mind sharing the learning rates you used in your experiments? Is it linear scaling according to batch size? Thanks!
after several epochs, the ap is still close to 0.
Almost all sets are default, training on COCO dataset.
bash: /usr/local/miniconda3/lib/libtinfo.so.6: no version information available (required by bash)
Namespace(aux_loss=True, backbone='resnet50', batch_size=2, bbox_loss_coef=5, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='/dataset/public/coco', dataset_file='coco', dec_layers=6, dec_n_points=4, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=1024, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, enc_n_points=4, epochs=50, eval=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, lr=0.0002, lr_backbone=2e-05, lr_backbone_names=['backbone.0'], lr_drop=40, lr_drop_epochs=None, lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], mask_loss_coef=1, masks=False, nheads=8, num_feature_levels=4, num_queries=300, num_workers=2, output_dir='exps/r50_deformable_detr', position_embedding='sine', position_embedding_scale=6.283185307179586, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_giou=2, sgd=False, start_epoch=0, two_stage=False, weight_decay=0.0001, with_box_refine=False, world_size=2) DeformableDETR( (transformer): DeformableTransformer( (encoder): DeformableTransformerEncoder( (layers): ModuleList( (0): DeformableTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (1): DeformableTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (2): DeformableTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (3): DeformableTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (4): DeformableTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (5): DeformableTransformerEncoderLayer( (self_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout2): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) ) (decoder): DeformableTransformerDecoder( (layers): ModuleList( (0): DeformableTransformerDecoderLayer( (cross_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (self_attn): MultiheadAttention( (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout2): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout4): Dropout(p=0.1, inplace=False) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (1): DeformableTransformerDecoderLayer( (cross_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (self_attn): MultiheadAttention( (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout2): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout4): Dropout(p=0.1, inplace=False) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (2): DeformableTransformerDecoderLayer( (cross_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (self_attn): MultiheadAttention( (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout2): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout4): Dropout(p=0.1, inplace=False) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (3): DeformableTransformerDecoderLayer( (cross_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (self_attn): MultiheadAttention( (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout2): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout4): Dropout(p=0.1, inplace=False) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (4): DeformableTransformerDecoderLayer( (cross_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (self_attn): MultiheadAttention( (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout2): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout4): Dropout(p=0.1, inplace=False) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (5): DeformableTransformerDecoderLayer( (cross_attn): MSDeformAttn( (sampling_offsets): Linear(in_features=256, out_features=256, bias=True) (attention_weights): Linear(in_features=256, out_features=128, bias=True) (value_proj): Linear(in_features=256, out_features=256, bias=True) (output_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout1): Dropout(p=0.1, inplace=False) (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (self_attn): MultiheadAttention( (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (dropout2): Dropout(p=0.1, inplace=False) (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=256, out_features=1024, bias=True) (dropout3): Dropout(p=0.1, inplace=False) (linear2): Linear(in_features=1024, out_features=256, bias=True) (dropout4): Dropout(p=0.1, inplace=False) (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) ) (reference_points): Linear(in_features=256, out_features=2, bias=True) ) (class_embed): ModuleList( (0): Linear(in_features=256, out_features=91, bias=True) (1): Linear(in_features=256, out_features=91, bias=True) (2): Linear(in_features=256, out_features=91, bias=True) (3): Linear(in_features=256, out_features=91, bias=True) (4): Linear(in_features=256, out_features=91, bias=True) (5): Linear(in_features=256, out_features=91, bias=True) ) (bbox_embed): ModuleList( (0): MLP( (layers): ModuleList( (0): Linear(in_features=256, out_features=256, bias=True) (1): Linear(in_features=256, out_features=256, bias=True) (2): Linear(in_features=256, out_features=4, bias=True) ) ) (1): MLP( (layers): ModuleList( (0): Linear(in_features=256, out_features=256, bias=True) (1): Linear(in_features=256, out_features=256, bias=True) (2): Linear(in_features=256, out_features=4, bias=True) ) ) (2): MLP( (layers): ModuleList( (0): Linear(in_features=256, out_features=256, bias=True) (1): Linear(in_features=256, out_features=256, bias=True) (2): Linear(in_features=256, out_features=4, bias=True) ) ) (3): MLP( (layers): ModuleList( (0): Linear(in_features=256, out_features=256, bias=True) (1): Linear(in_features=256, out_features=256, bias=True) (2): Linear(in_features=256, out_features=4, bias=True) ) ) (4): MLP( (layers): ModuleList( (0): Linear(in_features=256, out_features=256, bias=True) (1): Linear(in_features=256, out_features=256, bias=True) (2): Linear(in_features=256, out_features=4, bias=True) ) ) (5): MLP( (layers): ModuleList( (0): Linear(in_features=256, out_features=256, bias=True) (1): Linear(in_features=256, out_features=256, bias=True) (2): Linear(in_features=256, out_features=4, bias=True) ) ) ) (query_embed): Embedding(300, 512) (input_proj): ModuleList( (0): Sequential( (0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) ) (1): Sequential( (0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) ) (2): Sequential( (0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) ) (3): Sequential( (0): Conv2d(2048, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) ) ) (backbone): Joiner( (0): Backbone( (body): IntermediateLayerGetter( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): FrozenBatchNorm2d() (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): FrozenBatchNorm2d() ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) ) (layer2): Sequential( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d() ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) ) (layer3): Sequential( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d() ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) ) (layer4): Sequential( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): FrozenBatchNorm2d() ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): FrozenBatchNorm2d() (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): FrozenBatchNorm2d() (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): FrozenBatchNorm2d() (relu): ReLU(inplace=True) ) ) ) ) (1): PositionEmbeddingSine() ) ) number of params: 39847265 loading annotations into memory... Done (t=14.77s) creating index... index created! loading annotations into memory... Done (t=0.50s) creating index... index created! transformer.level_embed transformer.encoder.layers.0.self_attn.sampling_offsets.weight transformer.encoder.layers.0.self_attn.sampling_offsets.bias transformer.encoder.layers.0.self_attn.attention_weights.weight transformer.encoder.layers.0.self_attn.attention_weights.bias transformer.encoder.layers.0.self_attn.value_proj.weight transformer.encoder.layers.0.self_attn.value_proj.bias transformer.encoder.layers.0.self_attn.output_proj.weight transformer.encoder.layers.0.self_attn.output_proj.bias transformer.encoder.layers.0.norm1.weight transformer.encoder.layers.0.norm1.bias transformer.encoder.layers.0.linear1.weight transformer.encoder.layers.0.linear1.bias transformer.encoder.layers.0.linear2.weight transformer.encoder.layers.0.linear2.bias transformer.encoder.layers.0.norm2.weight transformer.encoder.layers.0.norm2.bias transformer.encoder.layers.1.self_attn.sampling_offsets.weight transformer.encoder.layers.1.self_attn.sampling_offsets.bias transformer.encoder.layers.1.self_attn.attention_weights.weight transformer.encoder.layers.1.self_attn.attention_weights.bias transformer.encoder.layers.1.self_attn.value_proj.weight transformer.encoder.layers.1.self_attn.value_proj.bias transformer.encoder.layers.1.self_attn.output_proj.weight transformer.encoder.layers.1.self_attn.output_proj.bias transformer.encoder.layers.1.norm1.weight transformer.encoder.layers.1.norm1.bias transformer.encoder.layers.1.linear1.weight transformer.encoder.layers.1.linear1.bias transformer.encoder.layers.1.linear2.weight transformer.encoder.layers.1.linear2.bias transformer.encoder.layers.1.norm2.weight transformer.encoder.layers.1.norm2.bias transformer.encoder.layers.2.self_attn.sampling_offsets.weight transformer.encoder.layers.2.self_attn.sampling_offsets.bias transformer.encoder.layers.2.self_attn.attention_weights.weight transformer.encoder.layers.2.self_attn.attention_weights.bias transformer.encoder.layers.2.self_attn.value_proj.weight transformer.encoder.layers.2.self_attn.value_proj.bias transformer.encoder.layers.2.self_attn.output_proj.weight transformer.encoder.layers.2.self_attn.output_proj.bias transformer.encoder.layers.2.norm1.weight transformer.encoder.layers.2.norm1.bias transformer.encoder.layers.2.linear1.weight transformer.encoder.layers.2.linear1.bias transformer.encoder.layers.2.linear2.weight transformer.encoder.layers.2.linear2.bias transformer.encoder.layers.2.norm2.weight transformer.encoder.layers.2.norm2.bias transformer.encoder.layers.3.self_attn.sampling_offsets.weight transformer.encoder.layers.3.self_attn.sampling_offsets.bias transformer.encoder.layers.3.self_attn.attention_weights.weight transformer.encoder.layers.3.self_attn.attention_weights.bias transformer.encoder.layers.3.self_attn.value_proj.weight transformer.encoder.layers.3.self_attn.value_proj.bias transformer.encoder.layers.3.self_attn.output_proj.weight transformer.encoder.layers.3.self_attn.output_proj.bias transformer.encoder.layers.3.norm1.weight transformer.encoder.layers.3.norm1.bias transformer.encoder.layers.3.linear1.weight transformer.encoder.layers.3.linear1.bias transformer.encoder.layers.3.linear2.weight transformer.encoder.layers.3.linear2.bias transformer.encoder.layers.3.norm2.weight transformer.encoder.layers.3.norm2.bias transformer.encoder.layers.4.self_attn.sampling_offsets.weight transformer.encoder.layers.4.self_attn.sampling_offsets.bias transformer.encoder.layers.4.self_attn.attention_weights.weight transformer.encoder.layers.4.self_attn.attention_weights.bias transformer.encoder.layers.4.self_attn.value_proj.weight transformer.encoder.layers.4.self_attn.value_proj.bias transformer.encoder.layers.4.self_attn.output_proj.weight transformer.encoder.layers.4.self_attn.output_proj.bias transformer.encoder.layers.4.norm1.weight transformer.encoder.layers.4.norm1.bias transformer.encoder.layers.4.linear1.weight transformer.encoder.layers.4.linear1.bias transformer.encoder.layers.4.linear2.weight transformer.encoder.layers.4.linear2.bias transformer.encoder.layers.4.norm2.weight transformer.encoder.layers.4.norm2.bias transformer.encoder.layers.5.self_attn.sampling_offsets.weight transformer.encoder.layers.5.self_attn.sampling_offsets.bias transformer.encoder.layers.5.self_attn.attention_weights.weight transformer.encoder.layers.5.self_attn.attention_weights.bias transformer.encoder.layers.5.self_attn.value_proj.weight transformer.encoder.layers.5.self_attn.value_proj.bias transformer.encoder.layers.5.self_attn.output_proj.weight transformer.encoder.layers.5.self_attn.output_proj.bias transformer.encoder.layers.5.norm1.weight transformer.encoder.layers.5.norm1.bias transformer.encoder.layers.5.linear1.weight transformer.encoder.layers.5.linear1.bias transformer.encoder.layers.5.linear2.weight transformer.encoder.layers.5.linear2.bias transformer.encoder.layers.5.norm2.weight transformer.encoder.layers.5.norm2.bias transformer.decoder.layers.0.cross_attn.sampling_offsets.weight transformer.decoder.layers.0.cross_attn.sampling_offsets.bias transformer.decoder.layers.0.cross_attn.attention_weights.weight transformer.decoder.layers.0.cross_attn.attention_weights.bias transformer.decoder.layers.0.cross_attn.value_proj.weight transformer.decoder.layers.0.cross_attn.value_proj.bias transformer.decoder.layers.0.cross_attn.output_proj.weight transformer.decoder.layers.0.cross_attn.output_proj.bias transformer.decoder.layers.0.norm1.weight transformer.decoder.layers.0.norm1.bias transformer.decoder.layers.0.self_attn.in_proj_weight transformer.decoder.layers.0.self_attn.in_proj_bias transformer.decoder.layers.0.self_attn.out_proj.weight transformer.decoder.layers.0.self_attn.out_proj.bias transformer.decoder.layers.0.norm2.weight transformer.decoder.layers.0.norm2.bias transformer.decoder.layers.0.linear1.weight transformer.decoder.layers.0.linear1.bias transformer.decoder.layers.0.linear2.weight transformer.decoder.layers.0.linear2.bias transformer.decoder.layers.0.norm3.weight transformer.decoder.layers.0.norm3.bias transformer.decoder.layers.1.cross_attn.sampling_offsets.weight transformer.decoder.layers.1.cross_attn.sampling_offsets.bias transformer.decoder.layers.1.cross_attn.attention_weights.weight transformer.decoder.layers.1.cross_attn.attention_weights.bias transformer.decoder.layers.1.cross_attn.value_proj.weight transformer.decoder.layers.1.cross_attn.value_proj.bias transformer.decoder.layers.1.cross_attn.output_proj.weight transformer.decoder.layers.1.cross_attn.output_proj.bias transformer.decoder.layers.1.norm1.weight transformer.decoder.layers.1.norm1.bias transformer.decoder.layers.1.self_attn.in_proj_weight transformer.decoder.layers.1.self_attn.in_proj_bias transformer.decoder.layers.1.self_attn.out_proj.weight transformer.decoder.layers.1.self_attn.out_proj.bias transformer.decoder.layers.1.norm2.weight transformer.decoder.layers.1.norm2.bias transformer.decoder.layers.1.linear1.weight transformer.decoder.layers.1.linear1.bias transformer.decoder.layers.1.linear2.weight transformer.decoder.layers.1.linear2.bias transformer.decoder.layers.1.norm3.weight transformer.decoder.layers.1.norm3.bias transformer.decoder.layers.2.cross_attn.sampling_offsets.weight transformer.decoder.layers.2.cross_attn.sampling_offsets.bias transformer.decoder.layers.2.cross_attn.attention_weights.weight transformer.decoder.layers.2.cross_attn.attention_weights.bias transformer.decoder.layers.2.cross_attn.value_proj.weight transformer.decoder.layers.2.cross_attn.value_proj.bias transformer.decoder.layers.2.cross_attn.output_proj.weight transformer.decoder.layers.2.cross_attn.output_proj.bias transformer.decoder.layers.2.norm1.weight transformer.decoder.layers.2.norm1.bias transformer.decoder.layers.2.self_attn.in_proj_weight transformer.decoder.layers.2.self_attn.in_proj_bias transformer.decoder.layers.2.self_attn.out_proj.weight transformer.decoder.layers.2.self_attn.out_proj.bias transformer.decoder.layers.2.norm2.weight transformer.decoder.layers.2.norm2.bias transformer.decoder.layers.2.linear1.weight transformer.decoder.layers.2.linear1.bias transformer.decoder.layers.2.linear2.weight transformer.decoder.layers.2.linear2.bias transformer.decoder.layers.2.norm3.weight transformer.decoder.layers.2.norm3.bias transformer.decoder.layers.3.cross_attn.sampling_offsets.weight transformer.decoder.layers.3.cross_attn.sampling_offsets.bias transformer.decoder.layers.3.cross_attn.attention_weights.weight transformer.decoder.layers.3.cross_attn.attention_weights.bias transformer.decoder.layers.3.cross_attn.value_proj.weight transformer.decoder.layers.3.cross_attn.value_proj.bias transformer.decoder.layers.3.cross_attn.output_proj.weight transformer.decoder.layers.3.cross_attn.output_proj.bias transformer.decoder.layers.3.norm1.weight transformer.decoder.layers.3.norm1.bias transformer.decoder.layers.3.self_attn.in_proj_weight transformer.decoder.layers.3.self_attn.in_proj_bias transformer.decoder.layers.3.self_attn.out_proj.weight transformer.decoder.layers.3.self_attn.out_proj.bias transformer.decoder.layers.3.norm2.weight transformer.decoder.layers.3.norm2.bias transformer.decoder.layers.3.linear1.weight transformer.decoder.layers.3.linear1.bias transformer.decoder.layers.3.linear2.weight transformer.decoder.layers.3.linear2.bias transformer.decoder.layers.3.norm3.weight transformer.decoder.layers.3.norm3.bias transformer.decoder.layers.4.cross_attn.sampling_offsets.weight transformer.decoder.layers.4.cross_attn.sampling_offsets.bias transformer.decoder.layers.4.cross_attn.attention_weights.weight transformer.decoder.layers.4.cross_attn.attention_weights.bias transformer.decoder.layers.4.cross_attn.value_proj.weight transformer.decoder.layers.4.cross_attn.value_proj.bias transformer.decoder.layers.4.cross_attn.output_proj.weight transformer.decoder.layers.4.cross_attn.output_proj.bias transformer.decoder.layers.4.norm1.weight transformer.decoder.layers.4.norm1.bias transformer.decoder.layers.4.self_attn.in_proj_weight transformer.decoder.layers.4.self_attn.in_proj_bias transformer.decoder.layers.4.self_attn.out_proj.weight transformer.decoder.layers.4.self_attn.out_proj.bias transformer.decoder.layers.4.norm2.weight transformer.decoder.layers.4.norm2.bias transformer.decoder.layers.4.linear1.weight transformer.decoder.layers.4.linear1.bias transformer.decoder.layers.4.linear2.weight transformer.decoder.layers.4.linear2.bias transformer.decoder.layers.4.norm3.weight transformer.decoder.layers.4.norm3.bias transformer.decoder.layers.5.cross_attn.sampling_offsets.weight transformer.decoder.layers.5.cross_attn.sampling_offsets.bias transformer.decoder.layers.5.cross_attn.attention_weights.weight transformer.decoder.layers.5.cross_attn.attention_weights.bias transformer.decoder.layers.5.cross_attn.value_proj.weight transformer.decoder.layers.5.cross_attn.value_proj.bias transformer.decoder.layers.5.cross_attn.output_proj.weight transformer.decoder.layers.5.cross_attn.output_proj.bias transformer.decoder.layers.5.norm1.weight transformer.decoder.layers.5.norm1.bias transformer.decoder.layers.5.self_attn.in_proj_weight transformer.decoder.layers.5.self_attn.in_proj_bias transformer.decoder.layers.5.self_attn.out_proj.weight transformer.decoder.layers.5.self_attn.out_proj.bias transformer.decoder.layers.5.norm2.weight transformer.decoder.layers.5.norm2.bias transformer.decoder.layers.5.linear1.weight transformer.decoder.layers.5.linear1.bias transformer.decoder.layers.5.linear2.weight transformer.decoder.layers.5.linear2.bias transformer.decoder.layers.5.norm3.weight transformer.decoder.layers.5.norm3.bias transformer.reference_points.weight transformer.reference_points.bias class_embed.0.weight class_embed.0.bias bbox_embed.0.layers.0.weight bbox_embed.0.layers.0.bias bbox_embed.0.layers.1.weight bbox_embed.0.layers.1.bias bbox_embed.0.layers.2.weight bbox_embed.0.layers.2.bias query_embed.weight input_proj.0.0.weight input_proj.0.0.bias input_proj.0.1.weight input_proj.0.1.bias input_proj.1.0.weight input_proj.1.0.bias input_proj.1.1.weight input_proj.1.1.bias input_proj.2.0.weight input_proj.2.0.bias input_proj.2.1.weight input_proj.2.1.bias input_proj.3.0.weight input_proj.3.0.bias input_proj.3.1.weight input_proj.3.1.bias backbone.0.body.conv1.weight backbone.0.body.layer1.0.conv1.weight backbone.0.body.layer1.0.conv2.weight backbone.0.body.layer1.0.conv3.weight backbone.0.body.layer1.0.downsample.0.weight backbone.0.body.layer1.1.conv1.weight backbone.0.body.layer1.1.conv2.weight backbone.0.body.layer1.1.conv3.weight backbone.0.body.layer1.2.conv1.weight backbone.0.body.layer1.2.conv2.weight backbone.0.body.layer1.2.conv3.weight backbone.0.body.layer2.0.conv1.weight backbone.0.body.layer2.0.conv2.weight backbone.0.body.layer2.0.conv3.weight backbone.0.body.layer2.0.downsample.0.weight backbone.0.body.layer2.1.conv1.weight backbone.0.body.layer2.1.conv2.weight backbone.0.body.layer2.1.conv3.weight backbone.0.body.layer2.2.conv1.weight backbone.0.body.layer2.2.conv2.weight backbone.0.body.layer2.2.conv3.weight backbone.0.body.layer2.3.conv1.weight backbone.0.body.layer2.3.conv2.weight backbone.0.body.layer2.3.conv3.weight backbone.0.body.layer3.0.conv1.weight backbone.0.body.layer3.0.conv2.weight backbone.0.body.layer3.0.conv3.weight backbone.0.body.layer3.0.downsample.0.weight backbone.0.body.layer3.1.conv1.weight backbone.0.body.layer3.1.conv2.weight backbone.0.body.layer3.1.conv3.weight backbone.0.body.layer3.2.conv1.weight backbone.0.body.layer3.2.conv2.weight backbone.0.body.layer3.2.conv3.weight backbone.0.body.layer3.3.conv1.weight backbone.0.body.layer3.3.conv2.weight backbone.0.body.layer3.3.conv3.weight backbone.0.body.layer3.4.conv1.weight backbone.0.body.layer3.4.conv2.weight backbone.0.body.layer3.4.conv3.weight backbone.0.body.layer3.5.conv1.weight backbone.0.body.layer3.5.conv2.weight backbone.0.body.layer3.5.conv3.weight backbone.0.body.layer4.0.conv1.weight backbone.0.body.layer4.0.conv2.weight backbone.0.body.layer4.0.conv3.weight backbone.0.body.layer4.0.downsample.0.weight backbone.0.body.layer4.1.conv1.weight backbone.0.body.layer4.1.conv2.weight backbone.0.body.layer4.1.conv3.weight backbone.0.body.layer4.2.conv1.weight backbone.0.body.layer4.2.conv2.weight backbone.0.body.layer4.2.conv3.weight Start training Epoch: [0] [ 0/29572] eta: 4:32:47 lr: 0.000200 class_error: 100.00 grad_norm: 78.99 loss: 40.0816 (40.0816) loss_bbox: 2.8369 (2.8369) loss_bbox_0: 2.9144 (2.9144) loss_bbox_1: 2.8846 (2.8846) loss_bbox_2: 2.9020 (2.9020) loss_bbox_3: 2.8461 (2.8461) loss_bbox_4: 2.8331 (2.8331) loss_ce: 2.2777 (2.2777) loss_ce_0: 2.0421 (2.0421) loss_ce_1: 2.1291 (2.1291) loss_ce_2: 2.0680 (2.0680) loss_ce_3: 2.2875 (2.2875) loss_ce_4: 2.2155 (2.2155) loss_giou: 1.6408 (1.6408) loss_giou_0: 1.6408 (1.6408) loss_giou_1: 1.6408 (1.6408) loss_giou_2: 1.6408 (1.6408) loss_giou_3: 1.6408 (1.6408) loss_giou_4: 1.6408 (1.6408) cardinality_error_unscaled: 296.2500 (296.2500) cardinality_error_0_unscaled: 295.5000 (295.5000) cardinality_error_1_unscaled: 296.2500 (296.2500) cardinality_error_2_unscaled: 296.2500 (296.2500) cardinality_error_3_unscaled: 296.2500 (296.2500) cardinality_error_4_unscaled: 296.2500 (296.2500) class_error_unscaled: 100.0000 (100.0000) loss_bbox_unscaled: 0.5674 (0.5674) loss_bbox_0_unscaled: 0.5829 (0.5829) loss_bbox_1_unscaled: 0.5769 (0.5769) loss_bbox_2_unscaled: 0.5804 (0.5804) loss_bbox_3_unscaled: 0.5692 (0.5692) loss_bbox_4_unscaled: 0.5666 (0.5666) loss_ce_unscaled: 1.1388 (1.1388) loss_ce_0_unscaled: 1.0210 (1.0210) loss_ce_1_unscaled: 1.0646 (1.0646) loss_ce_2_unscaled: 1.0340 (1.0340) loss_ce_3_unscaled: 1.1438 (1.1438) loss_ce_4_unscaled: 1.1078 (1.1078) loss_giou_unscaled: 0.8204 (0.8204) loss_giou_0_unscaled: 0.8204 (0.8204) loss_giou_1_unscaled: 0.8204 (0.8204) loss_giou_2_unscaled: 0.8204 (0.8204) loss_giou_3_unscaled: 0.8204 (0.8204) loss_giou_4_unscaled: 0.8204 (0.8204) time: 0.5535 data: 0.0000 max mem: 4316