-
https://www.anyscale.com/blog/training-175b-parameter-language-models-at-1000-gpu-scale-with-alpa-and-ray?utm_content=242470147&utm_medium=social&utm_source=linkedin&hss_channel=lcp-28464994
-
- Seed DB
- Clear DB
-
-
**Please describe the bug**
Hi,
Using a bfloat16, whether by initializing an embedding layer or casting a float32 to bfloat16, causes a double free exception and crash.
Sometimes it just prints out…
-
**Please describe the bug**
When I tried to use alpa to parallelize llama-7b model on ray cluster (one node with 8 GPUs), disk space will continue to grow and never stop due to ray object spilling. F…
-
I want to compile a debuggerable version of jaxlib modified by alpa from the source code so that I can enter the c++ underlying code debugging. It seems that it cannot be done according to the officia…
-
**System information**
- Alpa version: 1.0.0.dev0
- Are you willing to contribute it (Yes/No): Not sure, will submit PR if I have the bandwidth
**Describe the new feature and the current behavior…
-
# 🚀 Feature & Motivation
PyTorch/XLA recently launched PyTorch/XLA SPMD ([RFC](https://github.com/pytorch/xla/issues/3871), [blog](https://pytorch.org/blog/pytorch-xla-spmd/), [docs/spmd.md](https:…
-
**Please describe the bug**
The installation check failed when typing the following commands
ray start --head
python3 -m alpa.test_install
**Please describe the expected behavior**
The test faile…
-
**Describe the bug**
Running the Pythia-7B fine-tune script on 4 x A10 (24GB each).
Seems like issue with seq len:
_```
Token indices sequence length is longer than the specified maximum seque…
-
Hi!
**Describe the bug:**
I am trying to build `torchdistx` from source following the instructions in the [readme](https://github.com/pytorch/torchdistx#from-source). Basically, I am running -
``…