Add TGI 2.1.0 - Githubissues

awslabs / llm-hosting-container

Large Language Model Hosting Container

Apache License 2.0

75 stars 32 forks source link

Closed philschmid closed 2 months ago

philschmid commented 3 months ago

New models : gemma2

Multi lora adapters. You can now run multiple loras on the same TGI deployment https://github.com/huggingface/text-generation-inference/pull/2010

Faster GPTQ inference and Marlin support (up to 2x speedup).

Reworked the entire scheduling logic (better block allocations, and allowing further speedups in new releases)

rmarrugat commented 3 months ago

Will you update TEI to v1.3.0 as well?

haixiw commented 3 months ago

please rebase on this latest main branch