apple / ml-cvnets

CVNets: A library for training computer vision networks
https://apple.github.io/ml-cvnets
Other
1.76k stars 225 forks source link

Get different accuracy on different GPU #59

Closed jimmylin0979 closed 1 year ago

jimmylin0979 commented 1 year ago

Hi, I trained the mobilevit-xxs model on 2 different machine, and I got different results, while the accuracy on Titan RTX is always lower than the one on RTX 2080Ti by 0.5%.

Below is the specs of 2 machines:

After checking the code, I can only think of AMP as potential problem, but both gpus are using TU102 as chip, so they should support the same precision of float.

Do you have any idea about where might cause the problem ?

Thank you

sacmehta commented 1 year ago

Besides AMP, TF32 matmul could also be a culprit.

jimmylin0979 commented 1 year ago

Thanks for the fast replying !

Besides AMP, TF32 matmul could also be a culprit.

Under this circumstance, which mode did the experiments be trained ? in TF32 or FP32 mode ? Thank you !

sacmehta commented 1 year ago

Try to enable TF32. Also, use a longer warmup of 20 epochs.

jimmylin0979 commented 1 year ago

Try to enable TF32. Also, use a longer warmup of 20 epochs.

Thanks ! I will have a experiment with that settings !