OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
MIT License
663 stars 50 forks source link

Lazy loading #6

Open Ar57m opened 12 months ago

Ar57m commented 12 months ago

can you guys implement it on the app mlcchat as llama.cpp? cause in low ram devices it crashes instantly when trying to generate text