-
## 🐛 Bug
I'm working with nightly versions of torch/xla on TPU. When moving from torch==2.6.0.dev20241106+cpu to torch==2.6.0.dev20241107, I see significantly increased use of the TPU memory for SP…
-
Hello,
We saw the issue that a broadcast tensor from a single-dimension parameter is marked sharded by XLA sharding propagator. This sharded tensor, while doing computation with other tensor which ha…
-
### Proposal to improve performance
On the current main:
```shell
$ python benchmarks/benchmark_throughput.py --input-len 100 --output-len 100 --num-prompts 100 --model facebook/opt-125m -tp 2 …
-
### Motivation.
**TL;DR**: Introduce SPMD-style control plane to improve control plane architecture and optimize performance.
For distributed inference, vLLM currently leverages a “driver-worker”,…
-
Imagine the following scenario:
* I'm running an MPI-based SPMD programming model that owns main() and was launched by `mpirun`, called MPI_Init(), etc.
* I want to create a Chapel library that ca…
-
I would like to ask an example for initializing and updating a model using nnx.jit with SPMD.
Is there any relevant example?
-
### Description
### **Concept introduction**
The fact that SPMD has no scheduling overhead gives it the best performance, but it is often not easy enough to develop complex training tasks. For exa…
-
I encounter an error when using the ILP solver while compiling Llama3-Mini with nnscaler. Can you help me understand why this is happening?
```Python
2024-11-03 08:56:37 | INFO | nnscaler.autodist.spm…
-
### Feature Idea
Support For TPU/XLA Devices
### Existing Solutions
_No response_
### Other
_No response_
-
## ❓ Questions and Help
While in SPMD mode If we run the train command of a model on all the VMs together (single program multiple machines) each VM has its own dataloader using cpu cores.
Then, wh…