8bit model - Githubissues

Sentdex / ChatGPT-at-Home

ChatGPT @ Home: Large Language Model (LLM) chatbot application, written by ChatGPT

MIT License

316 stars 93 forks source link

8bit model #3

Open zshobbs opened 1 year ago

zshobbs commented 1 year ago

Run the LLM's over multiple GPUS Using 8bit models to compress the vram footprint. "facebook/opt-30b" runs on 2 nvidia rtx 3090's. "facebook/opt-66b" might squeeze onto bigger GPUs or you can use float16 to and CPU or nvme/ssd offload.

This uses Huggingface accelerate and bitsandbytes.