anarchy-ai / LLM-VM

irresponsible innovation. Try now at https://chat.dev/
https://anarchy.ai/
MIT License
477 stars 148 forks source link

Implement 4bit, 8bit quantization for all hardware #225

Open VictorOdede opened 1 year ago

VictorOdede commented 1 year ago

quantization for CPUs and other GPUs

mmirman commented 1 year ago

AMD, FPGA... ALL! (shaders?)

mmirman commented 1 year ago

($400 provided its not solved by a library call)

Jobhdez commented 11 months ago

@VictorOdede one question: how do we manipulate the language models to speed them up? Im interested in this issue but I need more information. could you elaborate on this please? thanks

VictorOdede commented 11 months ago

@Jobhdez the model weights are in bf16, we want to have them in int8 or int4 to lower memory footprint. See https://arxiv.org/abs/2210.17323 and https://arxiv.org/abs/2306.00978