The loss value about training is significant

yuyu19970716 commented 2 weeks ago

Hello, I would like to ask, I am running the training code on two GPUs on a single node, and the dataset is ImageNet1k, why does the training terminate zero at round 1370? Later, I modified the code, and now I can train 12500 rounds, but his final loss function graph can tend to be flat, and the loss value is very large, as shown in the figure, what causes this?

ayushnangia commented 1 week ago

are you trying to pretrain the model?

yuyu19970716 commented 4 hours ago

您是否正在尝试预训练模型？ Yes, I'm going to train the dinov2 model from scratch, and the loss value is big at the moment I think it's mainly because I'm using the ImageNet1k dataset to train from scratch, and there is only one GPU node, and now I want to use my own dataset to self-supervise the training of the Dinov2 model, can I modify my dataset to the format of ImageNet1k? Do you have any suggestions? For example, why is my loss so large? And how do I train my own dataset? Looking forward to your recovery!

facebookresearch / dinov2

The loss value about training is significant #427