gpu-parallelism Search Results

vllm-project/vllm #7958

[Usage]: How to specify certain GPUs for Tensor Parallelism …

### Your current environment I have a server with only one NVLink connection, so I need to use pipeline parallelism and tensor parallelism within a single node to improve its performance. I would lik…

henry-y updated 1 month ago

Vchitect/VEnhancer #17

Will you support multi-GPU tensor parallelism in the future?

I run this model on cloud server so I can choose to rent multi-GPU,it takes more than 1 hour to generate a video in 1 GPU and it doesn't get faster when I use more GPU.So I hope that you can support m…

Willian7004 updated 2 weeks ago

HigherOrderCO/Bend #358

Request: Multi GPU parallelism

Does this natively support parallelism accross gpus? Also feature request: natively perform flash attention please

Jainam2130 updated 4 months ago

PlasmaControl/DESC #775

Tutorial: Advanced AD / multi-gpu parallelism

f0uriest updated 1 month ago

facebookresearch/hydra #2892

[Help requested] Optuna Sweeper Multiple GPU Parallelism

Hello Hydra Team, I am exploring the possibility of integrating Optuna Sweeper for hyperparameter tuning in a multiple processes setup using GridSearch. My objective is to utilize multiple GPUs on …

gracikk-ds updated 1 month ago

meta-llama/llama-stack #152

VLLM / OpenAI Compatible endpoint support

The current implementation of local means no sharding/tensor parallelism, etc, and refuses to work on my dual 4090 setup. How do I enable multi gpu, or how do I enable a proper system like VLLM to run…

matbee-eth updated 3 days ago

triton-inference-server/tensorrtllm_backend #596

request is blocked and non output when using tensor parallel…

### System Info NVIDIA 2*L20 launch triton server with tensorrt-llm backend v0.12.0 in a container ### Who can help? _No response_ ### Information - [ ] The official example scripts -…

dwq370 updated 3 weeks ago

vllm-project/vllm #7519

[Feature]: Context Parallelism

### 🚀 The feature, motivation and pitch As we can see, Google Gemini can support up to million tokens and to serve longer context length, we have to do context parallelism, which means, split the i…

huseinzol05 updated 18 hours ago

trixi-gpu/TrixiCUDA.jl #43

Integrate TrixiGPU.jl with Enzyme.jl for Differentiable Prog…

I propose integrating the GPU version of [Trixi.jl](https://github.com/trixi-framework/Trixi.jl) with [Enzyme.jl](https://enzyme.mit.edu) for differentiable programming. **Benefits:** - **Diffe…

junyixu updated 1 week ago

EricLBuehler/mistral.rs #675

Distributed inference and tensor parallelism plans

With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a combination of…

EricLBuehler updated 1 month ago

1000+ results for gpu-parallelism

1000+ results
for gpu-parallelism