Open johndpope opened 1 year ago
Hi, the OOM error most likely means you don't have enough VRAM. I checked the code, it seems like you are loading the hf model, which may be not a quantized version. It consumes more VRAM than the GPTQ 4bit version.
I have 24gb - it's only remyxai/ffmperative-7b model.
related - https://github.com/microsoft/guidance/issues/328