-
I noticed that the results are not reproducible. I am using Llama with BootstrapFewShot, and every time I compile the same program, I get totally different results (not even close).
I noticed in the …
-
I plan to implement the function calling with vision models such as LLaVA and Nous-Hermes-2-Vision-Alpha based on the image, but it seems that the current implementation in the example folder only sup…
-
您好~感谢开源CodeShell大模型,我在尝试TGI运行CodeShell-7B-Chat-int4时遇到了一些问题,如果您能帮忙解决的话将不胜感激!
```
docker run --gpus 'all' --shm-size 20g -p 9090:80 -v /root/codeshell/model:/data --env LOG_LEVEL="info,text_g…
-
I want to deploy a few open source models with the chat UI. I started a simple model with:
```
model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid…
-
Since HF TGI's [PR](https://github.com/huggingface/text-generation-inference/pull/617) was merged, it should possible to integrate TGI endpoints to the lm-evaluation-harness supported APIs.
Any pl…
-
Greetings, @cipher982!
Currently we are working on the Openvino inference framework, and such benchmarks are critical to understand gaps and differences between our framework and Transformers/ TGI …
-
Hello-
I've been looking into hosting an LLM on AWS Infrastructure. I am mainly looking to host Flan T5 XXL. My question is below
Inquiry: what is the recommended container for hosting Flan T5 X…
-
I brought up ChatQnA UI with all the containers.
### Issue 1. Huggingface download update
Huggingface TGI container was downloading model, it took so much time around ~12min for Intel/Neural cha…
-
Users may need to set particular TGI(S) parameter when using Caikit+TGIS runtime on KServe. An example is the model timeout parameter which can be necessary to be tweaked based on the model size.
…
-
### Feature request
With more FP8 supported instances becoming available on all major platforms it would be nice if TGI can take advantages of this and start adding FP8 specific features, e.g. `FP8 E…