NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.77k stars 882 forks source link

Plans to implement HF's int8 inference? #283

Open JOHW85 opened 2 years ago

JOHW85 commented 2 years ago

Would be great if someone could look into implementing this particular version of int8 for serving LLMs. https://huggingface.co/blog/hf-bitsandbytes-integration

byshiue commented 2 years ago

Thank you for the suggestion. We will consider this optimization.