-
## 🐛 Bug
[torch/benchmarks/dynamo](https://github.com/pytorch/pytorch/blob/97ff6cfd9c86c5c09d7ce775ab64ec5c99230f5d/benchmarks/dynamo/common.py#L2028) testing suit sets SGD as the optimizer and set …
-
- GD需要用全部的样本来更新参数,计算代价高
- SGD容易在最小值处拔振荡(oscillating),需要优化learning rate的调整策略
- Batch Size取多大是一个问题
- 在现代GPU架构中,大的BatchSize有利于数据的并行,大的Batch Size一般需要配大的Learning Rate
- 大的BatchSize在训练上精度一般为如小的BatchSize…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_SGD_cud…
-
asynchronous SGD also has many options, 1 option which looks promising now is a ring approach
-
I've been working on MRI segmentation tasks using nnU-Net, and I've noticed that the standard configurations often utilize SGD as the optimizer. While I understand that the choice of optimizer and n…
-
Hi --
I'm wondering about the implementation of SGD w/ momentum in `autograd.optimizers.sgd`:
```
velocity = momentum * velocity - (1.0 - momentum) * g
x = x + learning_rate * velocity
```
O…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_SGD_cu…
-
Hi! I am trying to use accelerate to boost the performance of the model using riemannian_sgd optimiser. Although, I am not able to figure out how to set it up properly. I tried to follow the tutorials…
-
### Describe the issue
Unexpectedly training with the SGD optimizer is slower than training with the AdamW optimizer. By profiling with Nsight Systems I found out that the SGD optimizer copies appr…
-
Thanks for providing such a great implementations of various gradient descent algorithm. But for the example notebook, can you also provide the sgd_data.txt also ?