-
### System Info
- `transformers` version: 4.44.0
- Platform: Linux-6.5.0-44-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.24.5
- Safetensors version: 0.4.…
-
我在使用vllm0.5.0测试chatglm-6b的性能,一开始运行vllm时出现“AttributeError: ‘ChatGLMTokenizer‘ object has no attribute ‘tokenizer‘”的错误,把chatglm-6b中的tokenization_chatglm.py替换了之后运行vllm的benchmark_throughput.py时又出现如下错误,请问怎…
-
### System Info / 系統信息
xinference v0.13.2
其中vllm并不支持batching inference。使用openai的batching prompts就会报500错误。
为什么不参考
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_…
bstr9 updated
1 month ago
-
Hi, when I follow the default steps to set up environment:
pip install vllm
it will automaticly install vllm 0.5.0.post1, and transformers>=4.40.0 is required.
When installing SPPO ( transformer…
-
### Motivation.
Nowadays, many new applications including multi-turn conversations, multi-modality and multi-agent, require a significant amount of KV cache. Such applications generally have a shared…
-
I have finetuned the model, now trying to inference the results with vLLM. But, the results are so bad. Any idea why is that.
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a sim…
-
Solid idea and Ingenious code implementations, Great work!
Have you considered implementing KV Compression operations on KV Cache in the vLLM framework?
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/eosphoros-ai/DB-GPT/issues?q=is%3Aissue) and found no similar issues.
### Operating system information
Linux
### P…
-
### Your current environment
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (U…