Open ShriramGithub7 opened 1 year ago
Hi, did you make any changes to the classes? I am trying to train on my custom dataset with 12 classes. Any help would be appreciated.
Hi, did you make any changes to the classes? I am trying to train on my custom dataset with 12 classes. Any help would be appreciated.
I found the way to train model. Let me know in case you are still looking for help.
Hi @ShriramGithub7 I have the same issues. Can you please help me as well?
@mariiak2021 - Let me know what exactly is your issue? I can help accordingly
Hi @ShriMLEngineer I’m training the changed DETR transformer model on the custom dataset with 48 classes. During .train() mode the model is doing normal predictions (all different for one input), but if I run .eval() mode for evaluation - the outputs of the model are all same (or almost same). What can be the problem? Please see the output example below.
TRAINING mode with default dropout = 0.1: Transformer output:
tensor([[[-1.2072, -0.0247, 1.4827, …, 1.9502, -1.0014, -0.0777],
[-1.0472, 1.0473, -0.1111, …, 1.9880, -0.7706, -0.9534],
[-0.9117, 0.6312, 1.5603, …, 1.7538, -1.7898, -0.0764],
…,
[-1.5139, 0.8651, 1.3959, …, 1.4869, -0.7368, -0.9495],
[-1.3713, 0.1539, 1.1079, …, 1.7069, 1.0330, -0.5884],
[-0.9833, 0.9340, 1.4092, …, 2.1241, -0.9530, -0.0743]]],
device=‘cuda:0’, grad_fn=)
And final output of the model:
{‘pred_logits’: tensor([[-0.0081, 0.2497, -0.3607, …, -0.0395, -0.2595, -0.0823],
[ 0.0760, 0.3130, -0.2936, …, 0.1302, -0.7095, -0.1434],
[ 0.0592, 0.2099, -0.3150, …, 0.1744, -0.8228, -0.6104],
…,
[ 0.0553, 0.0501, -0.2397, …, 0.2563, -0.5291, -0.3276],
[-0.0884, -0.1168, -0.1549, …, 0.1795, -0.1136, 0.0661],
[ 0.3371, 0.1840, -0.7856, …, 0.2078, -0.0932, -0.4215]],
device=‘cuda:0’, grad_fn=)}
EVAL mode
tensor([[[-0.8847, 0.8008, 1.3410, …, 1.9398, -0.9303, -0.7386],
[-0.9201, 0.8979, 1.4128, …, 1.9263, -0.8656, -0.8076],
[-0.8593, 0.8811, 1.3582, …, 1.8995, -0.9350, -0.7555],
[-0.9158, 0.9414, 1.3551, …, 1.8739, -0.8665, -0.9009],
[-0.8818, 0.8018, 1.3489, …, 1.9480, -0.9242, -0.8278],
[-0.8603, 0.7642, 1.3655, …, 2.0025, -0.9490, -0.8010]]],
device=‘cuda:0’, grad_fn=)
{‘pred_logits’: tensor([[-0.1241, 0.0806, -0.3436, …, 0.2659, -0.3817, -0.2606],
[-0.0993, 0.0815, -0.3826, …, 0.2688, -0.3978, -0.2872],
[-0.1544, 0.0542, -0.3570, …, 0.2584, -0.3570, -0.2809],
…,
[-0.0540, 0.0955, -0.3878, …, 0.2575, -0.3749, -0.3109],
[-0.1354, 0.0464, -0.3761, …, 0.2357, -0.3562, -0.2966],
[-0.1206, 0.0465, -0.3985, …, 0.2530, -0.3951, -0.2766]],
device=‘cuda:0’, grad_fn=)}
Hi, did you make any changes to the classes? I am trying to train on my custom dataset with 12 classes. Any help would be appreciated.
I found the way to train model. Let me know in case you are still looking for help.
hi im still looking for help to train a detr model from scratch on a 118k coco dataset. Do you mean you solved the problem to solve the model from scratch without using --resume? I would really appreciate your help thanks :) !!
Hi @ShriramGithub7 I have the same issues. Can you please help me as well?
Hi @alcinos , @fmassa,
I am training DETR on my custom dataset. I did below changes in my code hubconf.py: _if pretrained: checkpoint = torch.load("detr-r50-e632da11.pth",map_location="cpu") del checkpoint["model"]["class_embed.weight"] del checkpoint["model"]["class_embed.bias"] torch.save(checkpoint,"detr-r50no-class-head.pth") main.py: _model_without_ddp.load_statedict(checkpoint['model'],strict=False)
I trained my model for 500 epochs but still mAP was only around zero. However, when I resumed training with RESUME parameters using "detr-r50_no-class-head.pth", then suddenly mAP jumped to 0.72. To check it again, I trained my model with 21 epochs and then resumed training with RESUME parameter. It then showed map again as 0.72. Please see below graphs
This means mAP is going up significantly only with RESUME parameter and its almost nothing without RESUME. Any reason why this is happening? Any way to make changes so that mAP will go up without RESUME?