the parameters are only 43196001, instead of 43524961

Cohesion97 commented 2 years ago

I run the default Conddetr-r50, but the num of parameters is different from that in the provided log.

Also, after training for 1 epoch, the eval results are [0.04369693586567375, 0.12083834673558262, 0.023675111814434113, 0.01864211602467282, 0.052261665895792626, 0.07171156446634068, 0.09023536974930606, 0.18654859799415718, 0.22196121793196433, 0.04610799601904764, 0.21023391350986004, 0.3797766209046455],

which is weaker (about 0.7AP) than that in the provided log [0.0509964214370242, 0.13292741190993088, 0.030383986414032393, 0.015355903493298791, 0.05914294278060285, 0.08176101640052409, 0.10028554935230335, 0.2012481198582593, 0.23517722389597043, 0.04296950016312112, 0.23670937055006003, 0.40016568706711353].

DeppMeng commented 2 years ago

Hi,

Did you enable the '--no_aux_loss' flag? We use aux_loss in training, and disabling this flag might cause fewer parameters as well as weaker performance. Moreover, the AP in the early training stage is unstable. +/- 0.7 AP at epoch 1 is not informative. Consistent lower performance in training (maybe epoch 1 to epoch 10) might indicate that the training has some problem.

Cohesion97 commented 2 years ago

Thanks for your answer.

I did not change the args. And I used aux loss during the training. The args are:

Namespace(_auxloss=True, backbone='resnet50', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, coco_path='/mnt/lustre/share/DSK/datasets/mscoco2017/', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, epochs=50, eval=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, is_slurm_job=True, lr=0.0001, lr_backbone=1e-05, lr_drop=40, mask_loss_coef=1, masks=False, nheads=8, num_queries=300, num_workers=2, output_dir='output/default', position_embedding='sine', pre_norm=False, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_giou=2, start_epoch=0, tcp_port='29550', weight_decay=0.0001, world_size=8) number of params: 43196001

BUT, after 5 epoch training, the model performs just the same as that in the provided logs. So, it might just be the performance fluctuation in the early stage caused by random seeds.

I will keep tracking the performance during the training and update the comment if some other problems happen.

onepeachbiubiubiu commented 1 year ago

I also encountered the problem of inconsistency in the amount of parameters.

Atten4Vis / ConditionalDETR

the parameters are only 43196001, instead of 43524961 #14