Open boqchen opened 1 month ago
Do you mean that you want to obtain the checkpoint not the pretrained model? choice 1: You could load the pretrained weight to init teacher and student, this may be helpful. choice 2: we can share the checkpoint for you, but it requires 2 nodes and it is prettey huge.
Hi @birkhoffkiki. I am sorry for the late reply.
I have a single node with 4 H100 GPUs (96Gb). I was wondering if this is sufficient to run GPFM. I tried to run dinov2 vitl14 and it works for me. Do you also have a checkpoint for vitl14?
Thanks for your time!
The released pretrained weight is ViT-L-14. You can try to load it. There maybe mismatch of keys. You could solve this by rename all keys of weight.
Thanks for your prompt reply and releasing the model! In you training config, I only saw you load pretrained model to the student. Since the teacher is just an ema update of the student, I was wondering if this is sufficient. (Also I did not see where I can load pretrained model to the teacher as well).
I can provide following checkpoint for you, but its for 2 nodes (16 GPUs). model_0176249.rank_0.pth model_0176249.rank_1.pth .... model_0176249.rank_15.pth If you can convert these into "normal checkpoint" and need it, let me know. I can share you a onedrive link.
That would be great! Thanks.
Hi,
Thanks for the great work. It seems like the pretraining takes a long time. I would like to run the pretraining but I cannot submit a job for such a long time. I was wondering if it is possible to resume the training from a checkpoint.
Thanks in advance!