-
I plan to implement the function calling with vision models such as LLaVA and Nous-Hermes-2-Vision-Alpha based on the image, but it seems that the current implementation in the example folder only sup…
-
Based on watsonx requirements, we should make available these metrics, at least:
- '# of inference requests over defined time period
- Avg. response time over defined time period
- '# of successf…
-
Got some suggestions for frameworks for self-hosted serving of llm and related.
# Embeddings from OpenAI clip.
Jina
https://github.com/jina-ai/clip-as-service (Apache)
# Text-embeddings:
My o…
-
### Feature request
The Transformers library supports the no_repeat_ngram_size parameter for generation. https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/text_generation#transformers.…
-
### Feature request
Support the recent larger embedding models of 7B or more parameters (20x larger than BERT-large)
### Motivation
The embedding models are being much larger than before in the pas…
ai-jz updated
4 months ago
-
I brought up ChatQnA UI with all the containers.
### Issue 1. Huggingface download update
Huggingface TGI container was downloading model, it took so much time around ~12min for Intel/Neural cha…
-
Hi guys love your project
I was wondering if you can add support to mistral via:
- [TGI](https://github.com/huggingface/text-generation-inference)
- [vllm](https://github.com/vllm-project/vllm)…
-
Repo used for testing
https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/xeon
OPEA Project Errors
2. Embedding Microservice
curl : Internal Server Error
At line:1 char:1…
-
需求背景:
TGI适配lightllm,多卡加载模型的时候,用到几张卡就会有几个进程,并且每个进程都会完整的加载整个模型到内存中来。
当模型文件太大,比如65B以上的模型,使用8卡加载的话就会需要8*130G的内存,这显然是不合理的,会导致OOM。
解决办法:
可在lightllm中帮忙提供load_from_weight_dict(weight_dict) 接口。TGI层传入权重词典…
-
Request
The ask is to introduce a openai text generation API compatibility layer (chat completion endpoint) to kserve/TGIS.
Why
Having an openai API compatibility layer will allow more open sourc…