Open JOHW85 opened 2 years ago
Would be great if someone could look into implementing this particular version of int8 for serving LLMs. https://huggingface.co/blog/hf-bitsandbytes-integration
Thank you for the suggestion. We will consider this optimization.
Would be great if someone could look into implementing this particular version of int8 for serving LLMs. https://huggingface.co/blog/hf-bitsandbytes-integration