-
请问能同时使用多卡生成instruction和response吗?比如在使用llama3.1-8b-instruct模型时,将device设置为"0,1,2,3,4,5,6,7"似乎还是只是用了一张卡部署vllm服务生成语料,请问代码支持同时多卡部署多个服务生成数据吗??谢谢
-
### Is your feature request related to a problem? Please describe.
First, congratulations for this app, its terrific! :)
My problem: my credits in openai finished, and I need to use my credits in …
-
If I set `prompt_logprobs`, I get `AssertionError: tensor model parallel group is already initialized`.
```
import time
from vllm import LLM, SamplingParams
prompts = [
"write a 10000 wor…
-
### Description:
In the Haystack 2.0 framework, components currently require the `component.output_types` decorator to specify the output types. This proposal aims to enhance the framework's usabil…
-
-
Set tensor_parallel_size=1 or tensor_parallel_size=2. the response is OK.
my env info:
vllm==0.2.2
ray==2.8.0
transformers==4.34.0
torch==2.1.0
-
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.68 GiB (GPU 0; 79.35 GiB total capacity; 47.98 GiB already allocated; 13.28 GiB free; 64.89 GiB reserved in total by PyTorch) If r…
-
How can I run it offline?
npm install
npm run dev is
After this,
-Local : http://local host:3000 is ready,
But nothing is visible when I use this link. Could you please tell me if I missed som…
-
![image](https://github.com/Mintplex-Labs/anything-llm/assets/19237481/8edc14c9-0c59-426d-a545-5557baf5bbfe)
How can I start the processor?thanks
-