-
## ❓ Questions and Help
While in SPMD mode If we run the train command of a model on all the VMs together (single program multiple machines) each VM has its own dataloader using cpu cores.
Then, wh…
-
1. **Prerequisite:** Make sure the LLM Inference framework can be launched following the SPMD style. For example, the LLM inference script can be launched by `torchrun --standalone --nproc=8 offline_i…
-
### Description
By doing sufficiently-exciting capture + sharding' + mapping behavior, it is possible to induce jax's batching to witness inconsistent sizes for the batch axis. The following code s…
jkr26 updated
2 months ago
-
### System Info
- `transformers` version: 4.45.0.dev0
- Platform: Linux-5.4.0-1043-gcp-x86_64-with-glibc2.31
- Python version: 3.10.14
- Huggingface_hub version: 0.24.6
- Safetensors version: 0.4…
-
**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torc…
-
## ❓ Questions and Help
When running on vp-128 TPU pod (even when sharding only by batch dimension) we are experiencing very low performance comparing to the same pod without SPMD.
Do you have any…
-
This feature of PoCL makes it possible to offload to multiple servers, see https://github.com/pocl/pocl/pull/1621#issuecomment-2415865032. That could be an interesting approach to distributing code, a…
-
## 🐛 Bug Report
When using [dynamo sharding](https://github.com/pytorch/xla/blob/88bcb45fda546e5c1fb4f12de75251bfa5fd332e/torch_xla/core/custom_kernel.py#L17) inside `torch.compile`, I encounter th…
-
Hello!
As per https://github.com/google/jax/discussions/23427, I'm noticing that XLA on CPU isn't doing a **fused** reduction sum for a very simple function if the input tensor is > 32 elements:
…
-
### Review Mojo's priorities
- [X] I have read the [roadmap and priorities](https://docs.modular.com/mojo/roadmap.html#overall-priorities) and I believe this request falls within the priorities.
…