Is it possible to pretrain tinyllama-3b on 2 V100s ?

TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

https://arxiv.org/abs/2402.14289

Apache License 2.0

662 stars 69 forks source link

Open Yang-bug-star opened 8 months ago

baichuanzhou commented 8 months ago

You can try to turn up the gradient_accumulation_steps, however, this was never tested. We recommend fintuning our published models.