-
The following JAX test crashes when compiled on a GCP c4a Axion ARM VM:
```
$ python tests/lax_test.py LaxTest.testConvGeneralDilatedLocal8
Running tests under Python 3.12.3: /home/phawkins/myenv…
-
I don't think we should be letting all MPI rank try to read the same P3 lookup text file as it can cause some issues on the filesystems esp at scale (and slow us down).
I implemented a read-on-rank…
-
## ❓ Questions and Help
In pytorch we can use `fsdp meta init` shard restore my big model(like have 80B parameters),in torch_xla i only find shard save like use this.https://github.com/pytorch/xla/bl…
-
## 🐛 Bug
```
File "/home/kojoe/EasyAnimate/easyanimate/pipeline/pipeline_easyanimate_inpaint.py", line 1369, in __call__
latent_model_input = xs.mark_sharding(
File "/home/kojoe/.local/lib…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
git HEAD
### Custom code
No
### OS platform and distribution
Ubuntu …
-
torchxla spmd whether expert parallelism is supported?
If it is a moe model, how should it be computed in xla?
## ❓ Questions and Help
-
This does not appear to cause any errors, and has been this way for a long time, but ...
A) in components/cam/cime_config/buildnml:
my $spmd = '-spmd';
if ($MPILIB eq 'mpi-serial') {$sp…
-
```
2023-07-18 23:43:01.804619: F external/xla/xla/hlo/ir/hlo_sharding.cc:961] Check failed: !IsManual()
Thread 1 "python" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid…
-
## ❓ Questions and Help
Fsdp can be well expressed by spmd, but hsdp seems to be unable to be expressed. Is there any way to express hsdp in spmd?
-
**What the problem is:**
Both single-node and sharded `TensorParallelMultiheadAttention`(#477) modules diverge (the forward output becomes `-inf` after less than 10 iterations). Also they produce d…