-
**Describe the bug**
When I use FusedAdam from deepspeed as an optimizer, without any other DeepSpeed features, it seems to crash due to a missing attribute. This is clear from the error trace. This …
-
Hi [victoresque](https://github.com/victoresque),
Thanks for your hero repo! I used `hydra_DDP` branch to build my application, but got some problems in [get_logger](https://github.com/victoresque/py…
-
First of all, thank you for your great contributions to this community.
Unfortunately, I had a problem while training on a DDP with multiple GPUs :(
Training under a single GPU is well-performing,…
-
(lmflow_train) root@duxact:/data/projects/lmflow/LMFlow# ./scripts/run_finetune.sh \
--model_name_or_path /data/guihunmodel8.8B \
--dataset_path /data/projects/lmflow/case_report_data \
--out…
-
Hello! Really appreciate your outstanding work!
However, when I try to retrain `geo2mat`, I encounter this problem:
```python
Time stamp: #5 save blend and glbs
524 0.06638479232788086 0.0918…
-
## 🐛 Bug
Hi,
I am running an mBART model to summarize Turkish news on Google Colab. I have mostly followed the instructions at [the official example](https://github.com/pytorch/fairseq/tree/master…
-
Hi there,
I noticed the the GPU memory consumption during training is unbalanced. To be more specific, I used 8 GPUs for training. It seems that GPU 0 uses 13449 MB GPU memory while other 7 GPUs us…
-
### 🐛 Describe the bug
For some reasons, I need to discard part of the data in the collate_fn of the dataloader, which makes my batch size change. My program gets stuck in the loss function when the …
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Current Behavior
Google Colab上面跑的
> %cd /content/ChatGLM-6B/ptuning
!bash` train.sh
尝试查了一下JSON好像没啥问题
…
-
## ❓Question
I am setting up a pytorch lightning experiment and using the AimLogger object to log training/validation losses, as well as test results.
However, while tracking during `trainer.fit` …