spmd Search Results - Githubissues

1000+ results
for spmd

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

openxla/xla #19105

[cpu] LLVM error during compilation of bf16 convert on aarch…

The following JAX test crashes when compiled on a GCP c4a Axion ARM VM: ``` $ python tests/lax_test.py LaxTest.testConvGeneralDilatedLocal8 Running tests under Python 3.12.3: /home/phawkins/myenv…

hawkinsp updated 2 weeks ago
1
E3SM-Project/scream #3023

P3 lookup text file being read by all MPI ranks

I don't think we should be letting all MPI rank try to read the same P3 lookup text file as it can cause some issues on the filesystems esp at scale (and slow us down). I implemented a read-on-rank…

ndkeen updated 1 month ago
1
pytorch/xla #7832

80B model how to shard restore in spmd training

## ❓ Questions and Help In pytorch we can use `fsdp meta init` shard restore my big model(like have 80B parameters),in torch_xla i only find shard save like use this.https://github.com/pytorch/xla/bl…

mars1248 updated 3 months ago
2
pytorch/xla #7827

xm.mark_step() RuntimeError: Bad StatusOr access: RESOURCE_…

## 🐛 Bug ``` File "/home/kojoe/EasyAnimate/easyanimate/pipeline/pipeline_easyanimate_inpaint.py", line 1369, in __call__ latent_model_input = xs.mark_sharding( File "/home/kojoe/.local/lib…

radna0 updated 2 months ago
14
tensorflow/tensorflow #61164

//tensorflow/dtensor/mlir/tests:spmd_expansion.mlir.test is …

### Issue type Bug ### Have you reproduced the bug with TensorFlow Nightly? Yes ### Source source ### TensorFlow version git HEAD ### Custom code No ### OS platform and distribution Ubuntu …

elfringham updated 1 year ago
1
pytorch/xla #7049

Spmd whether expert parallelism is supported？

torchxla spmd whether expert parallelism is supported？ If it is a moe model, how should it be computed in xla？ ## ❓ Questions and Help

mars1248 updated 6 months ago
3
E3SM-Project/E3SM #2325

configure not called correctly in CAM

This does not appear to cause any errors, and has been this way for a long time, but ... A) in components/cam/cime_config/buildnml: my $spmd = '-spmd'; if ($MPILIB eq 'mpi-serial') {$sp…

worleyph updated 5 years ago
1
openxla/xla #4340

JAX manual sharding triggers assertions in SPMD partitioner

``` 2023-07-18 23:43:01.804619: F external/xla/xla/hlo/ir/hlo_sharding.cc:961] Check failed: !IsManual() Thread 1 "python" received signal SIGABRT, Aborted. __pthread_kill_implementation (no_tid…

joker-eph updated 9 months ago
1
pytorch/xla #7607

How to use spmd to support hybrid shard data parallelism？

## ❓ Questions and Help Fsdp can be well expressed by spmd, but hsdp seems to be unable to be expressed. Is there any way to express hsdp in spmd?

mars1248 updated 4 months ago
3
pytorch/PiPPy #540

[spmd] self-attention not converging

**What the problem is:** Both single-node and sharded `TensorParallelMultiheadAttention`(#477) modules diverge (the forward output becomes `-inf` after less than 10 iterations). Also they produce d…

XilunWu updated 2 years ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for spmd

1000+ results
for spmd