-
### Bug description
I am trying to run a training module with CUDA using PyTorch Lightning, but Lightning keeps trying to use NCCL. I have tried every solution I have found online, from specifying …
-
### Bug description
After disabling automatic optimization, the Trainer behaves inconsistently between `precision='32'` and `precision='16-mixed'`.
### What version are you seeing the problem on?
v…
-
Hi,
when running training of BART-base on a `trn1.2xlarge` using the command
```
torchrun --nproc_per_node=2 run_summarization.py
```
I receive the below compilation error, I am wondering…
-
## Reproduction
`python evaluate.py scenarios/intersections/4lane -f agents/maddpg/baseline-lane-control.yaml --checkpoint ./log/results/run/4lane-4/MADDPG2_EarlyDone_c3c16_00000_0_2020-11-13_17-32-2…
rsuwa updated
11 months ago
-
### Is your feature request related to a problem? Please describe.
https://github.com/ChuangLee/ChatGLM-6B-multiGPU
自动平均分配显存。 之前单卡要13GB,很多GPU刚好用不了,很尴尬。
### Solutions
https://github.com/ChuangLee/…
-
### Bug description
Access denied to save model checkpoint on AWS S3.
### What version are you seeing the problem on?
v2.0
### How to reproduce the bug
```python
Run a PL project with CLOUD-BASED…
-
### Describe the bug
Arrow issues when running the example Notebook laptop locally on Mac with M1. Works on Google Collab.
**Notebook**: https://github.com/NielsRogge/Transformers-Tutorials/blob/m…
-
**Describe the bug**
I'm integrating RMM as a replacement for the default PyTorch Allocator. Everything works fine in simpler scenarios. However, with my project involving mixed precision training,…
-
### Bug description
i recently adapted a network architecture to a `LightningModule`, and find that when resuming a training in progress from a checkpoint file, the state of the `OneCycleLR` schedule…
-
### Current Behavior
Z moves are not a thing in Pokeclicker, though Z-Crystals are used as Alola badges to some extent
### Improved Behavior
Z-moves should be unlocked in Alola as part of gam…