model-parallel Search Results

1000+ results
for model-parallel

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

akash-akya/exile #47

[Optimize] graceful shutdown issues in some special cases.

Hi akash-aky, First of all, thank you for creating `Exile`, it's a very amazing library! I recently ran into some problems using it to execute `parallel` this application, here are my debug result: `…

EdmondFrank updated 2 weeks ago
2
vllm-project/vllm #10283

[Bug]: LLM initialization time increases significantly with …

### Your current environment vllm 0.5.2 The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to b…

piood updated 2 weeks ago
4
gpustack/gpustack #645

Resource estimation for Quantized Models with vLLM backend i…

**Describe the bug** Resource estimation for vLLM backend is incorrect and ignores quantization. **Steps to reproduce** 1. In a GPU server with 4 L20 (48G VRAM) cards without any model deploy…

pengjiang80 updated 8 hours ago
1
huggingface/optimum-neuron #735

AttributeError: can't set attribute 'deepspeed_plugin'

### System Info ```shell accelerate 1.1.1 neuronx-cc 2.14.227.0+2d4f85be neuronx-distributed 0.8.0 neuronx-distributed-training 1.0.0 optimum …

anushka0415 updated 1 week ago
3
ServerlessLLM/ServerlessLLM #157

[BUG] Does not support using multiple GPUs in current vLLM p…

### Prerequisites - [X] I have read the [ServerlessLLM documentation](https://serverlessllm.github.io/). - [X] I have searched the [Issue Tracker](https://github.com/ServerlessLLM/ServerlessLLM/issue…

attteegood updated 1 week ago
1
sgl-project/sglang #1673

[Feature] Make vLLM optional in model code

### UPDATE(11/23/2024) Currently, @james-p-xu is removing rope, @yizhang2077 is removing distributed, @HandH1998 is removing weight loader. Optimistically, we can remove these dependencies by the…

ByronHsu updated 5 days ago
1
vllm-project/vllm #9670

[Bug]: Input length greater than 32K in nvidia/Llama-3.1-Nem…

### Your current environment Running via Docker ```text docker run --runtime nvidia --gpus \"device=${CUDA_VISIBLE_DEVICES}\" --shm-size 8g -v $volume:/root/.cache/huggingface …

source-ram updated 1 week ago
5
vllm-project/vllm #7801

[Bug]: Running mistral-large results in an error related to …

### Your current environment ```tex The environment is the latest vllm-0.5.4's docker environment, and the command to run is:python3 api_server.py --port 10195 --model /data/models/Mistral-Large-Ins…

White-Friday updated 6 days ago
3
LC1332/Chat-Haruhi-Suzumiya #83

使用vllm数据并行和ChatHaruhi一起使用会报RuntimeError: Cannot re-initializ…

我的代码 from vllm import LLM, SamplingParams from chatharuhi import ChatHaruhi (这里只要导入ChatHaruhi就会报Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'sp…

545771889a updated 3 weeks ago
1
embeddings-benchmark/mteb #1422

Evaluating LLM2Vec fails

``` 2024-11-09 21:39:44.994636: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already b…

Muennighoff updated 2 weeks ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for model-parallel

1000+ results
for model-parallel