-
https://github.com/efeslab/Nanoflow
-
Hi there,
I am wondering what hardware does ray use for serving in this llmperf leaderboard. Is it cpu or gpu? if it is GPU what's the model?
Thanks,
Fizzbb
-
### System Info
hi,
i generated the tensorrt llm engine for a llama based model and see that the performance is much worse than vllm.
i did the following:
- compile model with tensorrt llm c…
-
HuggingFace has a standard TGI/Docker type container for serving LLM requests.
It would be useful to take advantage of HuggingFace features for TGI generation.
* Github: [Large Language Model Te…
-
Any data table for benchmark?
-
用以下方式验证glm4-9b-chat模型的输出,serving端报错
curl --request POST \
--url http://127.0.0.1:8000/v1/chat/completions \
--header 'content-type: application/json' \
--data '{
"model": "glm-4-9…
-
Oobabooga/text-generation-webui - OpenAI API through plugin (https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html)
ExLlamaV2 - OpenAI API (TabbyAPI)
GPT4ALL - OpenAI API
Llama.cpp -…
-
### Issues Policy acknowledgement
- [X] I have read and agree to submit bug reports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md)
### Where…
-
We are trying to launch codegeex4-all-9b Using vllm following the CodeGeeX4 github:
https://github.com/THUDM/CodeGeeX4?tab=readme-ov-file#vllm
The scripts are as following:
codegeex_offline_examp…
-
I get a “422 Unprocessable Entity” when calling a local LLM service and I don't know what's causing it。
![image](https://github.com/user-attachments/assets/6e010870-772f-4f18-aeb6-861c554e8091)