distributed-llm Search Results

vllm-project/vllm #5779

[Bug]: 使用vllm+ray分布式推理报错

### Your current environment Python==3.10.14 vllm==0.5.0.post1 ray==2.24.0 Node status --------------------------------------------------------------- Active: 1 node_37c2b26800cc853721ef351c…

JKYtydt updated 1 week ago

intel-analytics/ipex-llm #11415

inference error: mistral and codellama have issue 'object h…

GPU: 2 ARC CARD running following example, [inference-ipex-llm](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Pipeline-Parallel-Inference) **for mistral and codell…

raj-ritu17 updated 1 week ago

huggingface/accelerate #2721

Training on multiple GPUs with the HF trainer

I want to fine-tune the Pythia-6.9B language model on a dataset. The training requires about 90GB vRAM, so I need to use more than 1 gpus. (I use 3 A100 gpus, each with 40GB vRAM) I am trying to do th…

alistvt updated 1 week ago

intel-analytics/ipex-llm #11409

ImportError: undefined symbol: iJIT_NotifyEvent on 2-ARC GPU

Trying to do inference on arc GPU machine, have followed this guidelines: ``` https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Pipeline-Parallel-Inference and run_mi…

raj-ritu17 updated 1 week ago

NVIDIA/TensorRT-LLM #1741

Quantizing Phi-3 128k Instruct to FP8 fails.

### System Info - GPU name: L40s - CUDA: 12.1 ``` Wed Jun 5 16:27:21 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 …

kalradivyanshu updated 3 days ago

maitrix-org/llm-reasoners #61

Too few parameters for <class 'reasoners.algorithm.mcts.MCTS…

``` TypeError: Too few parameters for ; actual 2, expected 3 [2024-04-12 07:26:48,924] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1263) of binary…

nico1995lee updated 2 months ago

xorbitsai/inference #1622

BUG: NCCL error:

使用v0.12.0docker镜像部署，启动命令如下： sudo docker run -d -v /home/tskj/MOD/:/home/MOD/ -e XINFERENCE_HOME=/home/MOD -p 9997:9997 --gpus all xprobe/xinference:v0.12.0 xinference-local -H 0.0.0.0 --log-level de…

ye7love7 updated 2 days ago

intel-analytics/ipex-llm #11392

non-singleton dimension errors when run Deepspeed-AutoTP

HOST安装的步骤 conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index…

jianweimama updated 1 week ago

odissei-lifecourse/life-sequencing-dutch #23

LLM training fails with MPI

when running `pretrain.py` with 1 or 4 GPUs and the DDPStrategy as described in the docs, I get the following error ```bash "PyTorch/1.12.0-foss-2022a-CUDA-11.7.0/lib/python3.10/.../torch/distribu…

f-hafner updated 1 week ago

X-LANCE/SLAM-LLM #92

FSDP training raise "KeyError: 'ShardingStrategy.NO_SHARD'"

### System Info torch 2.0.1 torchaudio 2.0.2 torchvision 0.15.2 ### Information - [ ] The official example scripts - [ ] My own…

lzl-mt updated 5 days ago

1000+ results for distributed-llm

1000+ results
for distributed-llm