-
The model used for ChatQnA supports BFLOAT16, in addition to TGI's default 32-bit float type: https://huggingface.co/Intel/neural-chat-7b-v3-3
TGI memory usage halves from 30GB to 15GB (and also it…
-
Please update TGI image to 2.0 from 1.4 in all TGI readme files.
I faced issues with Phi-3 model.
-
I am trying to run the ChatQNA application
- Able to run all the microservices using docker compose file.
- Getting this error in the tgi-service, What is the correct way to provide the external IP…
-
### System Info
Text-generation-inference: v2.1.0+
Driver Version: 535.161.08 CUDA Version: 12.2 3
GPU: DGX with 8xH100 80GB
### Information
- [x] Docker
- [ ] The CLI directly
### Tasks
- [x…
-
When manually launching my fine-tune of idefics2, Huggingface TGI says `Unsupported model type idefics2`. How did you get the Idefics2 TGI to run on runpod?
-
### Feature request
Apologies if this should be elsewhere, but I'm curious if you plan on adding support for onnx models like https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx
### M…
-
1. Are there plans for inference support. This is needed if it's to be used by devs in production.
2. Is fine tuning much faster than LoRA?
- Optimization and backward pass are MUCH faster, but sure…
-
Related to #258, why services are using `hostIPC` option [1]:
```
$ git grep hostIPC
ChatQnA/kubernetes/manifests/chaqna-xeon-backend-server.yaml: hostIPC: true
ChatQnA/kubernetes/manifests/e…
-
Add integration for [TGI](https://github.com/huggingface/text-generation-inference) LLM provider.
-
Hi
As I tried with 13b version in TGI, it works fine with bitsandbytes quantization.
While trying with AWQ quantization in TGI, it shows error as "Cannot load 'awq' weight, make sure the model is al…