-
### Your current environment
I have a server with only one NVLink connection, so I need to use pipeline parallelism and tensor parallelism within a single node to improve its performance. I would lik…
-
I run this model on cloud server so I can choose to rent multi-GPU,it takes more than 1 hour to generate a video in 1 GPU and it doesn't get faster when I use more GPU.So I hope that you can support m…
-
Does this natively support parallelism accross gpus?
Also feature request: natively perform flash attention please
-
-
Hello Hydra Team,
I am exploring the possibility of integrating Optuna Sweeper for hyperparameter tuning in a multiple processes setup using GridSearch. My objective is to utilize multiple GPUs on …
-
The current implementation of local means no sharding/tensor parallelism, etc, and refuses to work on my dual 4090 setup. How do I enable multi gpu, or how do I enable a proper system like VLLM to run…
-
### System Info
NVIDIA 2*L20
launch triton server with tensorrt-llm backend v0.12.0 in a container
### Who can help?
_No response_
### Information
- [ ] The official example scripts
-…
-
### 🚀 The feature, motivation and pitch
As we can see, Google Gemini can support up to million tokens and to serve longer context length, we have to do context parallelism, which means, split the i…
-
I propose integrating the GPU version of [Trixi.jl](https://github.com/trixi-framework/Trixi.jl) with [Enzyme.jl](https://enzyme.mit.edu) for differentiable programming.
**Benefits:**
- **Diffe…
-
With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a combination of…