-
When compiling with c++23 the following errors are reported in cppspmd_sse.h:
```
In file included from /media/dezlow/Drive/Dev/C++/Oneiro/ThirdParty/KTX/lib/basisu/encoder/basisu_kernels_sse.cpp:…
-
In SPMD partitioner in the HandleReshape method there are a couple of cases I have found where the sharding of a tensor is split. This happens in gradient accumulation case in Jax.
Example
The p…
-
Hi,
I am trying to combine both GSPMD + PyTorch Compile, but it doesn't work.
I took a copy of the test script "test_train_spmd_imagenet.py" and test it in colab, and it started normally. However,…
-
Hello
I have models trained using multi-core TPU option.
I have saved their checkpoints using
```
import torch_xla.core.xla_model as xm
xm.save(checkpoint, path_checkpoints_file, master_only…
-
These 14 warnings are new with clang 12. Many of the projects I am importing have or had this problem including `vulkan.hpp`, now fixed.
```
In file included from /Users/mark/Projects/khronos/gith…
-
## 🚀 [RFC] A high-level GSPMD API in PT/XLA (based on `xs.mark_sharding`)
This RFC proposes a high-level API for GSPMD through a wrapper class and a partitioning rule function, based on `xs.mark_sh…
-
## ❓ Questions and Help
Generally we feel that since in SPMD most of the work is under the hood its hard to understand what is required from us when using it in order to sync between TPUs on a pod …
-
Next steps for the MPI module.
## Features
- [ ] Extend functionality
- Extend to MPI-2 and 3 routines
- We currently support most of MPI-1, and a random subset of MPI-2 and MPI-3.…
-
Hi Drew,
Hope you are doing well! I am using task_tools for our submission, but when I set `preschedule=FALSE` and used 25 nodes with more than 300K tasks to execute, I got the following error:
``…
-
We are seeing a compilation crash with a custom partitioning defined by the user. I'm attaching the repro script and instructions to repro. The error happens when running with TransformerEngine in ou…
Tixxx updated
3 months ago