-
The recent addition of optimizer CPU offload in torchao can be useful for single GPU low memory config.
https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload…
-
### Search before asking
- [x] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
Code to reproduce:
```python
import random
import numpy as np
import torch
from torch import optim
from torch.utils.data import DataLoader
from torchvision import transforms, datasets
fr…
-
Hi,
I'm training the fairseq with the following script and get the error ValueError: offset must be non-negative and no greater than buffer length.
fairseq-train data-bin --arch transformer \
…
-
Hello,
I get the following error when trying to run adam optimizer with float16 graph. Please note that changing the learner to another one (SGD for example) makes the code works correctly so this …
-
您好,我将Adam-mini集成到trainer后,使用deepspeed训练会爆显存
加载代码如下:
```
class CustomSeq2SeqTrainer(Seq2SeqTrainer):
r"""
Inherits Seq2SeqTrainer to compute generative metrics such as BLEU and ROUGE.
…
-
If you have a question or would like help and support, please ask at our
[forums](https://discuss.pytorch.org/).
If you are submitting a feature request, please preface the title with [feature req…
-
From the paper and your implementation, your examples are only use SGD optimizer. I am wondering if I can use this CLR for Adam or other optimizers. Many thanks.
-
From line 84,85 and 97,98 of the optimizer.py , we can see the b1 and b2 here are correspond to '1-b1' and '1-b2' respectively of the original adam paper, i.e., 'Adam: A Method for Stochastic O…
-
Hello, I'm trying to apply OGM-GE strategy to multimodal fusion network with text, video and audio modalities(e.g. MISA, MAG). However, when I use SGD optimizer, the model training process moves on wi…