-
### 🐛 Describe the bug
Torch does not allow 2D FSDP + TP to get FULL_STATE_DICT. However, if I remove checks here:
https://github.com/pytorch/pytorch/blob/3f62b05d31d4b29d60874b05adc0e5aedbad3722/to…
-
## 🔨Work Item
**IMPORTANT:**
* This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
* DO NOT create a new…
-
## I'm submitting a...
[ ] Regression (a behavior that used to work and stopped working in a new release)
[x] Bug report
[ ] Performance issue
[ ] Feature request
[ ] Documentation is…
-
**Is your feature request related to a problem? Please describe.**
I would like to be able to use multiple GPUs to generate multiple images at a time when using the `diffusers` backend.
**Desc…
-
**Describe the bug**
**BUG:** The uneven distribution of the dataset to GPUs can cause the error `[../third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:81] Timed out waiting xxxx ms for recv oper…
-
[rank1]:[W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
[rank0]:[W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL wor…
-
I tried an sccache distributed build today, and the build failed with errors like this:
```
1:38.92 /home/botond/dev/mozilla/central/js/src/jit/MIR.h:8337:218: error: result of comparison of unsi…
-
How is vescale zero2 implemented? Is the distributed optimizer of megatron zero2?
-
I was able to use git-bug with keybase repositories. the git url is usually something like `keybase://team/teamname/repository`, and the older version of git-bug was able to recognize it. Similarly if…
-
While trying to render this notebook, I had the following issues:
1. The packages are not installed by the notebook itself (I had to install them in the `make.jl` file). Probably this is a featur…