Open skill-diver opened 3 months ago
Its different depending on the encoder and decoder, the settings should be in the train experiment. Grad clip is 0.01 I think. Basically you can set grad clip thr super low so all gradients are clipped. This helped a bit with stability.
The model is trained with a step lr, so that at the end the lr is /10 the original onr. If you want to finetune, I suggest that lr.
It's probably difficult to replace the vit without scratch since the features will be different.
Sorry for lack of detail, on my phone and cant check stuff right now.
If you have issues with stability, you could check which params give nans and manually use fp32 there.
You might also want to freeze the batchnorm of the network, ive found the batchnorm can cause a lot of issues.
How many days you spend to train the roma model? I also find if I replace the dino with other vit the training result is bad
Hi Author,
Thank you for sharing this project and for your kindness for answering my previous questions. I have some of questions want to ask about training:
Thank you so much.