Add quantization support for TGI

GoogleCloudPlatform / ai-on-gke

AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine

Apache License 2.0

211 stars 154 forks source link

Add quantization support for TGI #757

Closed achandrasekar closed 1 month ago

achandrasekar commented 1 month ago

This change adds quantization support so we can use quantization techniques like eetq, bitsandbytes, etc. when deploying models using TGI. It allows us to benchmark models like Llama3 405B using FP8 too.