High mAP only with RESUME parameter, otherwise its almost zero

ShriramGithub7 commented 1 year ago

Hi @alcinos , @fmassa,

I am training DETR on my custom dataset. I did below changes in my code hubconf.py: _if pretrained: checkpoint = torch.load("detr-r50-e632da11.pth",map_location="cpu") del checkpoint["model"]["class_embed.weight"] del checkpoint["model"]["class_embed.bias"] torch.save(checkpoint,"detr-r50no-class-head.pth") main.py: _model_without_ddp.load_statedict(checkpoint['model'],strict=False)

I trained my model for 500 epochs but still mAP was only around zero. However, when I resumed training with RESUME parameters using "detr-r50_no-class-head.pth", then suddenly mAP jumped to 0.72. To check it again, I trained my model with 21 epochs and then resumed training with RESUME parameter. It then showed map again as 0.72. Please see below graphs

This means mAP is going up significantly only with RESUME parameter and its almost nothing without RESUME. Any reason why this is happening? Any way to make changes so that mAP will go up without RESUME?

parthkvv commented 1 year ago

Hi, did you make any changes to the classes? I am trying to train on my custom dataset with 12 classes. Any help would be appreciated.

ShriramGithub7 commented 1 year ago

Hi, did you make any changes to the classes? I am trying to train on my custom dataset with 12 classes. Any help would be appreciated.

I found the way to train model. Let me know in case you are still looking for help.

mariiak2021 commented 1 year ago

Hi @ShriramGithub7 I have the same issues. Can you please help me as well?

ShriMLEngineer commented 1 year ago

@mariiak2021 - Let me know what exactly is your issue? I can help accordingly

mariiak2021 commented 1 year ago

Hi @ShriMLEngineer I’m training the changed DETR transformer model on the custom dataset with 48 classes. During .train() mode the model is doing normal predictions (all different for one input), but if I run .eval() mode for evaluation - the outputs of the model are all same (or almost same). What can be the problem? Please see the output example below.

TRAINING mode with default dropout = 0.1: Transformer output:

tensor([[[-1.2072, -0.0247, 1.4827, …, 1.9502, -1.0014, -0.0777],
[-1.0472, 1.0473, -0.1111, …, 1.9880, -0.7706, -0.9534],
[-0.9117, 0.6312, 1.5603, …, 1.7538, -1.7898, -0.0764],
…,
[-1.5139, 0.8651, 1.3959, …, 1.4869, -0.7368, -0.9495],
[-1.3713, 0.1539, 1.1079, …, 1.7069, 1.0330, -0.5884],
[-0.9833, 0.9340, 1.4092, …, 2.1241, -0.9530, -0.0743]]],
device=‘cuda:0’, grad_fn=)

And final output of the model:

{‘pred_logits’: tensor([[-0.0081, 0.2497, -0.3607, …, -0.0395, -0.2595, -0.0823],
[ 0.0760, 0.3130, -0.2936, …, 0.1302, -0.7095, -0.1434],
[ 0.0592, 0.2099, -0.3150, …, 0.1744, -0.8228, -0.6104],
…,
[ 0.0553, 0.0501, -0.2397, …, 0.2563, -0.5291, -0.3276],
[-0.0884, -0.1168, -0.1549, …, 0.1795, -0.1136, 0.0661],
[ 0.3371, 0.1840, -0.7856, …, 0.2078, -0.0932, -0.4215]],
device=‘cuda:0’, grad_fn=)}

EVAL mode

tensor([[[-0.8847, 0.8008, 1.3410, …, 1.9398, -0.9303, -0.7386],
[-0.9201, 0.8979, 1.4128, …, 1.9263, -0.8656, -0.8076],
[-0.8593, 0.8811, 1.3582, …, 1.8995, -0.9350, -0.7555],
[-0.9158, 0.9414, 1.3551, …, 1.8739, -0.8665, -0.9009],
[-0.8818, 0.8018, 1.3489, …, 1.9480, -0.9242, -0.8278],
[-0.8603, 0.7642, 1.3655, …, 2.0025, -0.9490, -0.8010]]],
device=‘cuda:0’, grad_fn=)

{‘pred_logits’: tensor([[-0.1241, 0.0806, -0.3436, …, 0.2659, -0.3817, -0.2606],
[-0.0993, 0.0815, -0.3826, …, 0.2688, -0.3978, -0.2872],
[-0.1544, 0.0542, -0.3570, …, 0.2584, -0.3570, -0.2809],
…,
[-0.0540, 0.0955, -0.3878, …, 0.2575, -0.3749, -0.3109],
[-0.1354, 0.0464, -0.3761, …, 0.2357, -0.3562, -0.2966],
[-0.1206, 0.0465, -0.3985, …, 0.2530, -0.3951, -0.2766]],
device=‘cuda:0’, grad_fn=)}

jinlovespho commented 9 months ago

Hi, did you make any changes to the classes? I am trying to train on my custom dataset with 12 classes. Any help would be appreciated.

I found the way to train model. Let me know in case you are still looking for help.

hi im still looking for help to train a detr model from scratch on a 118k coco dataset. Do you mean you solved the problem to solve the model from scratch without using --resume? I would really appreciate your help thanks :) !!

sky-creater commented 5 months ago

Hi @ShriramGithub7 I have the same issues. Can you please help me as well?

facebookresearch / detr

High mAP only with RESUME parameter, otherwise its almost zero #537