Closed OnlinePage closed 9 months ago
Use load_in_4bit=True if you have limited GPU memory.
try to set different model device mapping because your two card are different, or try int4
@zRzRzRzRzRzRzR @1049451037 i tried to run it on cloud GPU once on h100 48GB and then on a10 24GB ,on both of them it throwed the same error☹️
I don't know what code you are running. It's your code problem. We have provided the code that can run on 24GB GPU.
python cli_demo_hf.py --from_pretrained THUDM/cogagent-chat-hf --fp16 --quant 4
Hi I am trying to run the model using on lambdalabs GPUs instances a10 and h100, However I am facing every time OutOfMemoryError on both of them.
Below is error trace with some GPU logs.
I tried multiple times , but everytimes it fails my inference code is via cog
Any help is highly appreciated🙏!!