-
The model used for ChatQnA supports BFLOAT16, in addition to TGI's default 32-bit float type: https://huggingface.co/Intel/neural-chat-7b-v3-3
TGI memory usage halves from 30GB to 15GB (and also it…
-
Please update TGI image to 2.0 from 1.4 in all TGI readme files.
I faced issues with Phi-3 model.
-
### System Info / 系統信息
A100
ghcr.io/huggingface/text-generation-inference:2.1.0
### Who can help? / 谁可以帮助到您?
@1049451037
使用tgi部署的遇到报错
ValueError: Unsupported model type cogvlm2
使用的镜像为:gh…
-
Langserve is used in:
```
$ git grep -i langserve
comps/llms/summarization/tgi/llm.py:from langserve.serialization import WellKnownLCSerializer
comps/llms/summarization/tgi/requirements.txt:langse…
-
### System Info
Text-generation-inference: v2.1.0+
Driver Version: 535.161.08 CUDA Version: 12.2 3
GPU: DGX with 8xH100 80GB
### Information
- [x] Docker
- [ ] The CLI directly
### Tasks
- [x…
-
When manually launching my fine-tune of idefics2, Huggingface TGI says `Unsupported model type idefics2`. How did you get the Idefics2 TGI to run on runpod?
-
### Feature request
Apologies if this should be elsewhere, but I'm curious if you plan on adding support for onnx models like https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx
### M…
-
1. Are there plans for inference support. This is needed if it's to be used by devs in production.
2. Is fine tuning much faster than LoRA?
- Optimization and backward pass are MUCH faster, but sure…
-
Related to #258, why services are using `hostIPC` option [1]:
```
$ git grep hostIPC
ChatQnA/kubernetes/manifests/chaqna-xeon-backend-server.yaml: hostIPC: true
ChatQnA/kubernetes/manifests/e…
-
Add integration for [TGI](https://github.com/huggingface/text-generation-inference) LLM provider.