Closed leebebeto closed 1 week ago
Hello! The overall training takes about 15h
, 5h
for stage-1, 10h
for stage-2 on our 8*A100 machine. It is reasonable that you finish pretraining quickly, because your pretraining is without video data. 2.xx
is similar to our pretrain loss.
Hi, thank you for the wonderful work. I pretrained stage 1 using image-caption pairs only (LLaVA-filtered-558K).
In your paper, it says it takes about 15 hours for overall training. Does it mean that it takes total of 15 hours for stage 1 and stage 2? Or does it mean 15 hours for each stage?
I used 4 A100 (80G) GPUs and I found that the training ends very quick. It took me about two hours for stage 1 to finish.
Also, I get loss scales of 2.xx. Is this loss values similar to what you've seen? Or does it needs to be converged more?
Thank you in advance!