spmd Search Results - Githubissues

1000+ results
for spmd

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/xla #7850

SPMD - how to use different dataloader on each VM of a TPU p…

## ❓ Questions and Help While in SPMD mode If we run the train command of a model on all the VMs together (single program multiple machines) each VM has its own dataloader using cpu cores. Then, wh…

dudulightricks updated 3 months ago
8
volcengine/verl #21

Basic Tutorial: Adding a New LLM Inference/Serving Backend

1. **Prerequisite:** Make sure the LLM Inference framework can be launched following the SPMD style. For example, the LLM inference script can be launched by `torchrun --standalone --nproc=8 offline_i…

PeterSH6 updated 2 days ago
1
jax-ml/jax #23476

vmap(SPMD axis)/shmap/(vmap with capture) pattern breaks bat…

### Description By doing sufficiently-exciting capture + sharding' + mapping behavior, it is possible to induce jax's batching to witness inconsistent sizes for the batch axis. The following code s…

jkr26 updated 2 months ago
1
huggingface/transformers #33289

Qwen2-VL Doesn't Execute on TPUs

### System Info - `transformers` version: 4.45.0.dev0 - Platform: Linux-5.4.0-1043-gcp-x86_64-with-glibc2.31 - Python version: 3.10.14 - Huggingface_hub version: 0.24.6 - Safetensors version: 0.4…

radna0 updated 1 month ago
3
modelscope/ms-swift #1837

How to use XLA/TPU? Potentially SPMD FSDP with InternVL mode…

**Describe the bug** What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) **Your hardware and system info** Write your system info like CUDA version/system/GPU/torc…

radna0 updated 2 months ago
5
pytorch/xla #7987

Speeding up computation while using SPMD on large TPU pod

## ❓ Questions and Help When running on vp-128 TPU pod (even when sharding only by batch dimension) we are experiencing very low performance comparing to the same pod without SPMD. Do you have any…

dudulightricks updated 2 months ago
3
JuliaGPU/OpenCL.jl #258

Try out PoCL-R

This feature of PoCL makes it possible to offload to multiple servers, see https://github.com/pocl/pocl/pull/1621#issuecomment-2415865032. That could be an interesting approach to distributing code, a…

maleadt updated 1 month ago
1
pytorch/xla #7645

[Dynamo SPMD Sharding]: Error - Input tensor is not an XLA t…

## 🐛 Bug Report When using [dynamo sharding](https://github.com/pytorch/xla/blob/88bcb45fda546e5c1fb4f12de75251bfa5fd332e/torch_xla/core/custom_kernel.py#L17) inside `torch.compile`, I encounter th…

huzama updated 4 months ago
1
openxla/xla #16792

XLA not fusing reduce for CPU tensor with > 32 elements

Hello! As per https://github.com/google/jax/discussions/23427, I'm noticing that XLA on CPU isn't doing a **fused** reduction sum for a very simple function if the input tensor is > 32 elements: …

JeffGreen updated 2 weeks ago
1
modularml/mojo #3790

[Feature Request] Built-in Support for Struct of Arrays (SoA…

### Review Mojo's priorities - [X] I have read the [roadmap and priorities](https://docs.modular.com/mojo/roadmap.html#overall-priorities) and I believe this request falls within the priorities. …

whisper-bye updated 4 days ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for spmd

1000+ results
for spmd