-
Hi,
I am using zero3 with latest 0.9.2 to train 8gpus in one node, and from the stat, I see memory grow a lot after each checkpoint saving.
With checking the code, I find one suspected memory le…
-
@stas00, @tjruwase - Tagging you here since I have seen you working on ZeRO3 extensively. Apologies if I shouldn't do this.
**Describe the bug**
I am fine tuning a LoRA model on top of BioBART-V2-B…
-
A common feature to use when building training frameworks is to save the optimizer state along with network state.
There's already a way to convert a `D::Vec` into a rust `Vec`, which should make i…
-
# Trending repositories for C#
1. [**Unity-Technologies / megacity-metro**](https://github.com/Unity-Technologies/megacity-metro)
__Megacity-Metro: a thrilling shooter game, using…
-
Dear professor,
I'd like to train stylemelgan vocoder of 32kHz, here is my config to train a multi-speaker model, now the speaker similarity on VC task is worse than fregan/hifigan.
Can you give m…
-
### Description
### 1. Checkpoint Loading
- Many examples in our doc load checkpoints from in memory python objects.
- Examples like [this](https://docs.ray.io/en/latest/ray-air/key-concepts.h…
-
### 🚀 The feature, motivation and pitch
The nonlinear conjugate gradient (CG) method is a good alternative to the L-BFGS optimizer. Features of nonlinear CG:
1. It theoretically converges faster…
-
**Task Description:**
Training a simple classifier using keras + horovod spark and getting below error
**Error:**
```
[3]:Error in sys.excepthook:
[3]:
[3]:Original exception was:
[3]:#
…
-
### Recommended solution
[Bump optimizer version from 0.12.6 to 0.14.0 or latest.](https://github.com/search?q=repo%3Aterra-money%2Fterrain%200.12.6&type=code)
### Issue description
see more de…
-
Hello,
I'm encountering some issues when using an optimizer on my platform,
which is Raspberry Pi 4 with a 64-bit operating system.
When optimizing the pose, I'm getting inconsistent optimization…