-
### 📚 The doc issue
i want to know. How to run vllms with remote ray cluster
my code is
from llama_index.llms.vllm import Vllm
import ray
ray.init(address="ray://10.0.233.89:10001")
llm = Vll…
-
Loki error no space left on device, but my Utilization disk /data high avail.
```
4|loki | level=error ts=2023-12-20T18:03:23.174574581Z caller=flush.go:221 org_id=fake msg="failed to flush u…
-
### What happened + What you expected to happen
Currently Ray [logs](https://github.com/ray-project/ray/blob/c83c64c1434133cee48c3f85ad3aa12b5e62b0c3/src/ray/common/file_system_monitor.cc#L105) an …
-
### Reproduction steps
![bug2_](https://user-images.githubusercontent.com/60655830/156793098-711125d5-739e-40c3-bcc4-a29e6f3a8bb2.jpg)
UX of this section can be improved.
Space utilization can be…
-
We use Postgres's insight into disk utilization in deciding when to prune beyond the target retention and potentially all the way up to the minimum retention. However the disk utilization info might b…
-
### Description
Update the mobile view to use a two-column layout for better space utilization in footer section.
### Expected Behavior
### Screenshots
![Screenshot_2024-07-18-17-57-20-728_c…
-
### Proposal to improve performance
Currently, vLLM allocates all available GPU memory after loading model weights, regardless of the max_model_len setting. This can lead to inefficient memory usage,…
-
### Search before asking
- [X] I have searched the Ultralytics [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar feature requests.
### Description
Absolutely …
-
手里没有4X80G的卡, 在4X40G的卡环境中Qwen2-VL-72B-Instruct显存不够, 通过多node用模型流水来部署, 但是vLLM中不支持, 这个咱们后续后可能支持吗
```bash
python3 -m vllm.entrypoints.openai.api_server --port 8000 --model /llm_weights/Qwen2-VL-72B-Ins…
-
I feel like we were graphing this per-device in the past but I can't find it being collected.