Open VictorOdede opened 1 year ago
AMD, FPGA... ALL! (shaders?)
($400 provided its not solved by a library call)
@VictorOdede one question: how do we manipulate the language models to speed them up? Im interested in this issue but I need more information. could you elaborate on this please? thanks
@Jobhdez the model weights are in bf16, we want to have them in int8 or int4 to lower memory footprint. See https://arxiv.org/abs/2210.17323 and https://arxiv.org/abs/2306.00978
quantization for CPUs and other GPUs