awslabs / llm-hosting-container

Large Language Model Hosting Container
Apache License 2.0
75 stars 32 forks source link

Add TGI 2.1.0 #82

Closed philschmid closed 2 months ago

philschmid commented 3 months ago

New models : gemma2

Multi lora adapters. You can now run multiple loras on the same TGI deployment https://github.com/huggingface/text-generation-inference/pull/2010

Faster GPTQ inference and Marlin support (up to 2x speedup).

Reworked the entire scheduling logic (better block allocations, and allowing further speedups in new releases)

rmarrugat commented 3 months ago

Will you update TEI to v1.3.0 as well?

haixiw commented 3 months ago

please rebase on this latest main branch