-
### 🚀 The feature, motivation and pitch
The nonlinear conjugate gradient (CG) method is a good alternative to the L-BFGS optimizer. Features of nonlinear CG:
1. It theoretically converges faster…
-
Hi, I tried the baseline model but encountered some issues to report for help.
Firstly, there are some mismatched dataset names from the [download script](https://github.com/mrqa/MRQA-Shared-Task-2…
-
### Describe the bug
Hi,
I am training the KGE Model [QuatE](https://pykeen.readthedocs.io/en/stable/api/pykeen.models.QuatE.html) on a cuda device and I am running into a `Cuda Out of Memory Error…
-
A common feature to use when building training frameworks is to save the optimizer state along with network state.
There's already a way to convert a `D::Vec` into a rust `Vec`, which should make i…
-
Hi,
I am using zero3 with latest 0.9.2 to train 8gpus in one node, and from the stat, I see memory grow a lot after each checkpoint saving.
With checking the code, I find one suspected memory le…
-
### 🐛 Describe the bug
When trying to finetune the Teyvat example on 2 GPU, the training stuck right after the first epoch starts to run.
ERRORS is like:
`Epoch 0: 0%| …
-
@stas00, @tjruwase - Tagging you here since I have seen you working on ZeRO3 extensively. Apologies if I shouldn't do this.
**Describe the bug**
I am fine tuning a LoRA model on top of BioBART-V2-B…
-
Dear professor,
I'd like to train stylemelgan vocoder of 32kHz, here is my config to train a multi-speaker model, now the speaker similarity on VC task is worse than fregan/hifigan.
Can you give m…
-
### The problem
Since the last database migration with a focus on POSTGRESQL, I see almost every day that the recorder stops recording data. Restarting HA brings the recording back, but the data in…
-
## 🐛 Bug
When calling optimizer.state_dict() it raises a KeyError:
```
/root/module/class.py in save_model_parameters(self, model_parameters_path, epoch, optimizer)
254 …