dvlab-research / LLaMA-VID

Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Apache License 2.0
622 stars 39 forks source link

OOM in stage2 finetuning #93

Open Nastu-Ho opened 1 month ago

Nastu-Ho commented 1 month ago

I use 8 40G or 64G graphics cards to train, batchsize is set to 1, and then oom will still appear during the training process. I've seen that most time memory usage during training probably stays around 30G, but at some point it exceeds the memory capacity.