-
@carmocca This is the stack trace I get when I import PyTorch lightning with the following `environment.yml`
```yaml
name: pl_error
channels:
- defaults
- pytorch
- conda-forge
depend…
-
The team is developing a special event as a prelude to the Goblin War.
We are seeking 2 pieces of art to add to the game:
1. Ancient Goblin Weapon
2. Ancient Human Weapon
We cannot reveal mu…
-
NCCL_AVG is not supported in NCCL version under 2.10
-
I ran a very simply example and got error:
```
WARNING:root:Bagua cannot detect bundled NCCL library, Bagua will try to use system NCCL instead. If you encounter any error, please run `import bagua_…
-
**Describe the bug**
A clear and concise description of what the bug is.
**Environment**
- Your operating system and version: ubuntu focal
- Your python version: 3.8
- Your PyTorch version…
-
We can do this in a similar way as we add nccl to bagua-core: https://github.com/BaguaSys/bagua-core/blob/master/setup.py and https://github.com/BaguaSys/bagua-core/blob/master/python/bagua_core/_envi…
-
**Describe the bug**
A clear and concise description of what the bug is.
![image](https://user-images.githubusercontent.com/78945582/138982691-b514295d-2eeb-4d65-b9f8-920bb1c0f4d5.png)
**Environm…
-
## 🚀 Feature
### Motivation
Blog: https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/
We currently use the fairscale's implementation.
_______________________…
-
For example https://github.com/BaguaSys/bagua/runs/3670056809, https://github.com/BaguaSys/bagua/runs/3723931716, https://github.com/BaguaSys/bagua/runs/3745518831, https://github.com/BaguaSys/bagua/r…
-
I used DecentralizedAlgorithm in shift_one peer_selection_mode with 8 GPUs, bagua backend says i have odd number ranks (only one), but you can see from the NCCL log that this job does have 8 GPUs. Doe…