Open HarryHsing opened 2 months ago
Thanks for your great work!
I wonder will inference quantization be supported in the future for memory efficiency?
Thanks!
Hello @HarryHsing , For inference we already did quantization by loading the language model in 8-bit by setting low_resource: True in llama2 test config or mistral test config
low_resource: True
Thanks for your great work!
I wonder will inference quantization be supported in the future for memory efficiency?
Thanks!