facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.08k stars 2.37k forks source link

using --resume, I encounter a strange problem. #582

Open frabob2017 opened 1 year ago

frabob2017 commented 1 year ago

Hello, DETR team,

I am not sure if it is a bug or my problem.

I start my training with inital weight provided by DETR detr-r50-e632da11.pth, whose size is 158.9M

python main.py --coco_path /content/drive/MyDrive/mydata --batch_size 20 --epochs 10 --resume /path/to/detr-r50-e632da11.pth --output_dir /path/to/save_weight

It runs well. I got the new weight checkpoint.pth whose size is 473M,

then I continue to run training based on this new weight. python main.py --coco_path /content/drive/MyDrive/mydata --batch_size 20 --epochs 10 --resume /path/to/checkpoint.pth --output_dir /path/to/save_weight.

It get such message

Namespace(lr=0.0001, lr_backbone=1e-05, batch_size=20, weight_decay=0.0001, epochs=10, lr_drop=200, clip_max_norm=0.1, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', enc_layers=6, dec_layers=6, dim_feedforward=2048, hidden_dim=256, dropout=0.1, nheads=8, num_queries=100, pre_norm=False, masks=False, aux_loss=True, set_cost_class=1, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, bbox_loss_coef=5, giou_loss_coef=2, eos_coef=0.1, dataset_file='coco', coco_path='/content/drive/MyDrive/', coco_panoptic_path=None, remove_difficult=False, output_dir='/content/drive/MyDrive/weights/8330', device='cuda', seed=42, resume='/content/drive/MyDrive/weights/8330/checkpoint.pth', start_epoch=0, eval=False, num_workers=2, world_size=1, dist_url='env://', distributed=False) /usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=ResNet50_Weights.IMAGENET1K_V1. You can also use weights=ResNet50_Weights.DEFAULT to get the most up-to-date weights. warnings.warn(msg) number of params: 41302368 loading annotations into memory... Done (t=0.00s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! Start training Training time 0:00:00

There is no training, do you know why? why the new weight checkpoint.pth size 473M is much bigger than the initial weight size detr-r50-e632da11.pth, 173M?

frabob2017 commented 1 year ago

I also do not use the initial weight detr-r50-e632da11.pth, I directly run

python main.py --coco_path /content/drive/MyDrive/mydata --batch_size 20 --epochs 10 --output_dir /path/to/save_weight

It works, then I used the new weight, but I still has the same problem.

frabob2017 commented 1 year ago

I figure it out, there is a mistake from mine. Thank you. I corrected it and it works now.

candlecove-nju commented 11 months ago

Hello!I meet the same problem, how do you fix it?