-
## ❓ Questions and Help
I printed all the thunk that was executed and found that there were a lot of thunk that didn't appear in my tensorboard. And the order of the front and back is also wrong.
I …
-
### 🚀 The feature, motivation and pitch
The current implementation provides no coordination around completion or failure.
A call to `load_state_dict` or `save_state_dict` should complete either wh…
-
### 🚀 The feature, motivation and pitch
### Motivation
SPMD sharding in pytorch/XLA offers model parallelism by sharding tensors within an operator. However, we need a mechanism to integrate thi…
-
# 🚀 Feature & Motivation
PyTorch/XLA recently launched PyTorch/XLA SPMD ([RFC](https://github.com/pytorch/xla/issues/3871), [blog](https://pytorch.org/blog/pytorch-xla-spmd/), [docs/spmd.md](https:…
-
CTSM went through a similar transformation, and we can likely use it as a guide to make this happen. But, the uses of MCT modules in these core SLIM modules needs to be removed. This is something that…
-
### Request description
In XLA there are the [sharding propagation pass](https://github.com/openxla/xla/blob/2eba54a187e03ccd0f65669234b80966bdbcda5e/xla/service/sharding_propagation.h#L66) and [SP…
-
In https://github.com/google/jax/issues/13081 we found that XLA doesn't support SPMD sharding of fast-fourier transform ops. It should!
-
Here's an overview of the features we intend to work on in the near future.
## Core Keras
### Saving & export
- Implement saving support for sharded models (sharded weights files).
- Improve…
-
## ❓ Questions and Help
When starting gpu spmd training with `torchrun`, why does it need to be compiled once per machine? Although the resulting graph is the same. Is there any way to avoid it
-
Ported this issue from https://github.com/google/jax/issues/21562
This code
```python
import jax
import numpy as np
import jax.numpy as jnp
from jax.sharding import PartitionSpec as PS, Name…