-
### ⚠️ Search for existing issues first ⚠️
- [X] I have searched the existing issues, and there is no existing issue for my problem
### Which Operating System are you using?
Linux
### Which versio…
-
Is there a recommended way to run data parallel inference (i.e. a copy of the model on each GPU)? It's possible by hacking CUDA_VISIBLE_DEVICES, but I was wondering if there's a cleaner method.
```py…
-
### Your current environment
```text
The output of `python collect_env.py`
```
```
:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…
-
### 📚 The doc issue
![image](https://github.com/InternLM/lmdeploy/assets/62475359/749182ac-fb3f-43d0-bbae-09c219ac0c40)
如题,我使用llava-1.5-13b模型,直接跑文档的[api_server 性能测试]报错
![image](https://github.com/I…
-
### System Info
Name Version Build Channel
langchain 0.0.350 pypi_0 pypi
langchain-cli 0.0.19 …
-
I have encountered an issue when attempting to run the `vllm_inference.py` script from the Modal Examples repository. Below are the steps I followed and the error I encountered:
### Steps to Reprod…
-
First of all, thanks for this amazing package!
**Context:**
We're experimenting with running some rather unruly LLMs (i.e. they love repeating themselves in some cases). Due to the nature of our t…
-
https://github.com/denoland/deno_core/issues/898
/bounty 200
definition of done:
- does not crash anymore
-
I've done some experiments with vllm and read through the docs, but have not been able to get higher performing systems.
I have a couple of questions.
1) Will using vllm on linux with a 4090 get fas…
jtoy updated
2 months ago
-
**What would you like to be added**:
Supports setting different commands/liveness for leader pod and other pods within a group
**Why is this needed**:
There are multiple LLM frameworks that s…