tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #110

When using tensor parallelism, the computing power usage of …

When using tensor parallelism, the computing power usage of one of the GPUs drops to 0, while the usage of the other GPU rises to 100%, the request does not respond, and the service cannot handle new …

Missmiaom updated 9 months ago
7
microsoft/DeepSpeed #4704

[REQUEST]Support for multiple node inference?

Hi, I want to run one LLM model using multiple machines. On one node, I want to use tensor parallel to speedup. Within multiple nodes, I want to use pipeline parallel. Is this supported? If s…

sleepwalker2017 updated 7 months ago
9
aws-neuron/neuronx-distributed #24

MPMD detected error when using `optimum-neuron` with TP

So basically I am trying to train LLama / Mistral. I run the following command: ```bash NEURON_RT_LOG_LEVEL=info XLA_USE_BF16=1 ./train_mistral.sh ``` Here is the link to [train_mistral.sh](ht…

michaelbenayoun updated 1 month ago
1
coreylowman/dfdx #595

Multi-GPU Support

Scaling models requires they be trained in data-parallel, pipeline parallel, or tensor parallel regimes. The last two, being both "model parallel", require a single model to be shared across GPUs. Thi…

jafioti updated 5 months ago
9
aws-neuron/aws-neuron-sdk #749

Llama-2-13b inference example failing on inf2

I have inf2.24xlarge and I am running the Llama-2 inference example. All the packages are installed latest. Everything worked fine until the step where I load model with tp_degree = 24 and it faile…

Rajmehta123 updated 3 months ago
15
NVIDIA/TransformerEngine #416

Flash-attn linear VS te.pytorch.linear

Hello, I have some questions when I use transformer_engine. There are some parallel operators in my model, such as RowParallelLinear and ColumnParallelLinear from flash_attn. How can I replace these o…

yingtongxiong updated 1 year ago
5
predibase/lorax #57

Project Roadmap

WIP project roadmap for LoRAX. We'll continue to update this over time. # v0.10 - [ ] Speculative decoding adapters - [ ] AQLM # v0.11 - [ ] Prefix caching - [ ] BERT support - [ ] Embe…

tgaddair updated 1 month ago
33
lm-sys/FastChat #2702

inference with multiple GPUs is too slow

Fastchat to enable baichuan2 LLM to use openai invoke, two V100 32G GPUs. It is more slower than model running with one GPU when inference. Nearly 3 tokens in 5 seconds. ``` python3 -m fastchat.serv…

garyyang85 updated 1 month ago
3
pytorch/ao #391

[RFC] torchao Contributor Guide

Status: Draft Updated: 09/10/2024 # Objective In this doc we’ll talk about how different optimization techniques are structured in torchao and how to contribute to torchao. # torchao Stack Ove…

jerryzh168 updated 4 hours ago
16
mratsim/Arraymancer #616

2023-12-31 - Longstanding missing features

Arraymancer has become a key piece of Nim ecosystem. Unfortunately I do not have the time to develop it further for several reasons: - family, birth of family member, death of hobby time. - competin…

mratsim updated 5 months ago
25

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism