tensor-parallelism Search Results

1000+ results
for tensor-parallelism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #2988

Limited Request Handling for AMD Instinct MI300 X GPUs with …

Reproducing steps: 1. Clone the vllm repo and switch to [tag v0.3.1](https://github.com/vllm-project/vllm/tree/v0.3.1) 2. Build the Dockerfile.rocm dockerfile with instructions from [Option 3: Bui…

Spurthi-Bhat-ScalersAI updated 4 months ago
6
NVIDIA/TensorRT-LLM #989

Why is there no Alltoall function in MoE implementation?

Hi, I am running and profiling the code of the Mixtral implementation, however, neither in the code nor in the profiling, did I find any Alltoall operations. I built the TRT engine using the follo…

YJHMITWEB updated 4 months ago
1
Infini-AI-Lab/Sequoia #11

The support on vLLM?

Hi, I remember the support on vLLM was on your TODOs. Have you achieved it now? Was the main challenge in this direction that the batch size > 1 tree verification is hard to made efficient? Thanks…

KexinFeng updated 2 months ago
1
jafioti/luminal #46

Intel GPU

Currently Intel offers an A770 for $300 with 16gb of ram and much better flops than a 4060ti ($500). From what I hear over at tinygrad, they have much better drivers than AMD. We should support Int…

jafioti updated 3 months ago
1
lm-sys/FastChat #2746

Slower inference with vLLM worker on 4 A100

I deployed wizardLLM-70b which is fine-tuned variant of llama2-70b on 4 A100 (80 GB) using vLLM worker. I noticed a much slower response (more than a minute even for a simple prompt like Hi) at a thro…

tacacs1101-debug updated 2 months ago
9
triton-inference-server/tensorrtllm_backend #462

How to deploy one model instance across multiple GPUs to tac…

I am trying to deploy a Baichuan2-7B model on a machine with 2 Tesla V100 GPUs. Unfortunately each V100 has only 16GB memory. I have applied INT8 weight-only quantization, so the size of the engine I…

shil3754 updated 1 week ago
6
NVIDIA/Megatron-LM #881

[BUGS] Pipeline Parallelism fails/hangs with Megatron Core e…

**Describe the bug** When the provided example script is configured to use pipeline parallelism, two different behaviours are observed. 1. When tensor parallelism (tp) = 1 and pipeline parallelism (…

schheda1 updated 1 week ago
1
vllm-project/vllm #3314

What's up with Pipeline Parallelism?

Hey vllm team, Hope you're all doing great! I‘m focusing on pipeline parallel inference and I hope it can be support on vllm. I noticed that pipeline parallelism was on the old roadmap（#244） , b…

duanzhaol updated 3 months ago
3
ray-project/ray #36650

[vLLM/Serve] Create polished vLLM example on a Serve deploym…

The example should show tensor parallelism. I am not sure if Serve + vLLM + tensor parallelism works at the moment because the Serve deployment will request N GPUs, then each vLLM worker will request …

cadedaniel updated 8 months ago
4
pytorch/xla #6347

[RFC] Pipeline parallelism for Pytorch/XLA

## 🚀 Description Pipeline parallelism is a technique used in deep learning model training to improve efficiency and reduce the training time of large neural networks. Here we propose a pipeline paral…

YangFei1990 updated 3 weeks ago
30

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for tensor-parallelism

1000+ results
for tensor-parallelism