-
Hi all,
I found that using Adam-mini 1.0.1 cannot run in 4 shards, it would threw the exception related to Tensor reshaping:
```
File "/opt/conda/lib/python3.10/site-packages/adam_mini/adam_m…
-
At the moment in the Adam optimizer the exponential moving averages are decoupled from the bias correction, as per the original paper. However, it is possible to combine these operations into a sing…
-
如题
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_Adam_cu…
-
Here is the error:
File "/home/workspace/x-flux-main/src/flux/modules/layers.py", line 499, in __call__
output = attn.linear2(torch.cat((attn_1, attn.mlp_act(mlp)), 2))
torch.OutOfMemoryError…
-
The embedded workflow is set on the MiqRequestTask in `MiqProvisionRequestTemplate#service_options` called from `MiqProvisionRequestTemplate#create_tasks_for_service`
```
[----] D, [2024-07-12T13:16…
-
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_Adam_c…
-
### 🐛 Describe the bug
When running with Adam eager, about 1/3 of our benchmark models fail accuracy. It's about uniform across suites.
[list of failing models](https://github.com/pytorch/pytorch/…
-
According to the [docs](https://github.com/tensorflow/kfac/blob/master/kfac/python/keras/README.md), this optimizer is supposed to `converge much faster (>3.5x) and with fewer iterations (>14x) than S…
ghost updated
2 months ago