Different training logs with multiple trains of the same program

Hello sir,

I am trying to finetune your model using the city persons dataset ( benchmark dataset for pedestrian detection). And I am training the model on 2 GPUs.

I have fixed all the seed values (42 by default) I have fixed the image size. Removed all the augmentations(applying only normalization) and I am training the model with a fixed set of hyperparameters. And have turned off the shuffle option in the distributed sampler.

I train the model for 35 epochs with a learning drop at the 34th epoch.

But with multiple runs of the same training script, the results I am getting at each training are different, and the deviation among them is high. (+ or - 2-3 mAP).

Meaning: I perform training once and get 35 trained models; now again, I train and then get another 35 models. Now the results between these are not very consistent and have high deviations. Note: I have cloned a fresh copy from GitHub before running this experiment

I am trying to figure out why there is this much inconsistency between each training run.

Please let me know if I have missed out on something.

waiting for your replay Thank you

IDEA-Research / DINO

Different training logs with multiple trains of the same program #161