-
### ❓ The question
System/Peak GPU Memory (MB)=6,784
2024-08-06 09:59:26.181 intern-studio-160750:0 olmo.train:908 INFO [step=1/739328,epoch=0]
optim/total_grad_norm=231.7
train/C…
-
At the moment, in the C bindings, the points are represented as a struct with 3 fields. It creates 3 indirections when calling the C code. When performing many operations on the same point, the same m…
-
The recent addition of optimizer CPU offload in torchao can be useful for single GPU low memory config.
https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim#optimizer-cpu-offload…
-
i tried to run the train.py but i encountered with this error " no module named optims. ".
i thought it's a library so i tried to install it using pip , but it turned out not to be this case. so can …
-
For many applications, one needs to pass a function that evaluates the cost (or the log-posterior) from the control vector. For instance:
1. MCMC sampling: samples=MCMC(logpost_function, starting_poi…
-
Hi! I find MeZO-adam code in medium size folder, but it uses the Adam from pytorch.optim. Its not like the case in large_models that author re-write the inner_loop. Can you please explain it? Thank yo…
-
optim 2
-
optim 3
-
I'd like to check this optimizer if its not too hard to implement, it should be less mem usage than Adamw8bit
https://github.com/yangluo7/CAME
-
Ran this from the demo code:
```
import os
# Check if we're in Colab
try:
import google.colab # noqa: F401 # type: ignore
in_colab = True
except ImportError:
in_colab = False
…