spmd Search Results - Githubissues

pytorch/xla #8423

TPU memory use increased significantly in torch/xla - 2.6.0.…

## 🐛 Bug I'm working with nightly versions of torch/xla on TPU. When moving from torch==2.6.0.dev20241106+cpu to torch==2.6.0.dev20241107, I see significantly increased use of the TPU memory for SP…

dudulightricks updated 3 hours ago

openxla/xla #17913

SPMD: broadcast of replicated tensor is mark sharded

Hello, We saw the issue that a broadcast tensor from a single-dimension parameter is marked sharded by XLA sharding propagator. This sharded tensor, while doing computation with other tensor which ha…

fhaolinaws updated 2 weeks ago

vllm-project/vllm #6888

[Performance]: tracking ray dag plus spmd performance

### Proposal to improve performance On the current main: ```shell $ python benchmarks/benchmark_throughput.py --input-len 100 --output-len 100 --num-prompts 100 --model facebook/opt-125m -tp 2 …

youkaichao updated 3 weeks ago

vllm-project/vllm #6556

[RFC]: Single Program Multiple Data (SPMD) Worker Control Pl…

### Motivation. **TL;DR**: Introduce SPMD-style control plane to improve control plane architecture and optimize performance. For distributed inference, vLLM currently leverages a “driver-worker”,…

ruisearch42 updated 3 weeks ago

chapel-lang/chapel #26146

Support SPMD initialization of a multi-locale / global-view …

Imagine the following scenario: * I'm running an MPI-based SPMD programming model that owns main() and was launched by `mpirun`, called MPI_Init(), etc. * I want to create a Chapel library that ca…

bradcray updated 1 week ago

google/flax #4129

SPMD for initializing model using nnx.jit

I would like to ask an example for initializing and updating a model using nnx.jit with SPMD. Is there any relevant example?

mmorinag127 updated 1 month ago

ray-project/ray #48556

[core][compiled-graphs] A MPMD Graph controller focus on N-M…

### Description ### **Concept introduction** The fact that SPMD has no scheduling overhead gives it the best performance, but it is often not easy enough to develop complex training tasks. For exa…

MoFHeka updated 1 week ago

microsoft/nnscaler #13

Error with ILP Solver When Compiling Llama3-Mini with nnscal…

I encounter an error when using the ILP solver while compiling Llama3-Mini with nnscaler. Can you help me understand why this is happening? ```Python 2024-11-03 08:56:37 | INFO | nnscaler.autodist.spm…

CraneQinghe updated 3 weeks ago

comfyanonymous/ComfyUI #5635

Support For TPU/XLA Devices

### Feature Idea Support For TPU/XLA Devices ### Existing Solutions _No response_ ### Other _No response_

radna0 updated 1 week ago

pytorch/xla #7850

SPMD - how to use different dataloader on each VM of a TPU p…

## ❓ Questions and Help While in SPMD mode If we run the train command of a model on all the VMs together (single program multiple machines) each VM has its own dataloader using cpu cores. Then, wh…

dudulightricks updated 3 months ago

1000+ results for spmd

1000+ results
for spmd