-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the [LangGraph](https://langchain-ai.github.io/langgraph/)/LangChain documentation with the integrat…
-
### System Info
- nvidia:535.129.03
- cuda_version:12.4
- GPU:L40S
- OS:Ubuntu 22.04.4 LTS(docker)
- tensorrt-llm: 0.11.0.dev2024060400
### Who can help?
_No response_
### Information
…
-
I have faced an error with the VLLM framework when I tried to inferencing an Unsloth fine-tuned LLAMA3-8b model...
### Error:
(venv) ubuntu@ip-192-168-68-10:~/ans/vllm-server$ python -O -u -m vl…
-
I'm getting the following import error:
```
sgl ➜ export CUDA_VISIBLE_DEVICES=4; python -m sglang.launch_server --model-path meta-llama/Llama-2-7b-chat-hf --port 30000
Traceback (most recent call…
-
### Description
make the vllm example with latest vllm version(v0.4.3) works,
by follow the current example from https://docs.ray.io/en/master/serve/tutorials/vllm-example.html
I got exception:
``…
-
I have seen that the AutoFP8 quantized models from Huggingface, especially Mixtral-8x7B-FP8 is supported by vllm. I am wondering if both kv_cache and weight quantized models quantized by AutoFP8 are …
-
Using vllm to infer the deepseek model encountered an error
```
[rank0]: self.mlp = DeepseekV2MoE(config=config, quant_config=quant_config)
[rank0]: File "/home/root/.local/lib/python3.10/s…
-
ModuleNotFoundError: No module named 'vllm.engine.ray_utils'
Please tell me vllm version,thanks
-
### Describe the bug
python run_vllm.py
2024-07-05 15:25:04,647 WARNING utils.py:580 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Plea…
-
## Description
I would like to inquire if there are any plans to support more configuration settings for vLLM, specifically related to RoPE scaling and theta adjustments.
## Background
vLLM curre…