-
## ❓ Questions and Help
It is to my understanding that Adam should use more memory than SGD because it keeps track of more parameters. However, when I look at my profiles between Adam and SGD optim…
-
In `train_vqe` in `main.py`, the optimizer options are given by argument `optimizer_options`. However, the description in the `help` documentation is unclear (without example code, a general user woul…
-
### 🐛 Describe the bug
[The doc](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) of `optim.SGD()` doesn't say that the type of `dampening`, `maximize`, `foreach`, `differentiable` a…
-
Hi,
I wanted to know is there any specific reason that you are using SGD with momentum optimizer instead of more recent variants like Adam and AdaGrad?
How will the model perform if I use Adam? …
-
### Search before asking
- [X] I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and [discussions](https://github.com/ultralytics/yolov5/discussions) and found no simi…
-
Hi!
I want to search the best optimizer for the given "mnist_example" from SGD and Adam.
However, for SGD, I also want to know which momentum value is the best (which Adam doesn't need), but for …
-
Hi,
is there a way to penalize the magnitude of the constants (via, e.g., L2 regularization)? I am trying to fit a `SymbolicRegressor` with some noisy data and sometimes I get very large values for…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
0.9.0
### Reproduction
opacus使用只要对训练函数使用privacy_engine.make_private函数包裹即可,请问对于sft我该去哪里修改?
model = Ne…
-
Код в 11 ячейке от начала (если считать только ячейки с кодом, начиная с 1) не запускается, выдается ошибка времени исполнения:
Training with SGD optimizer
----------------------------------------…
-
**Describe the bug**
Broadcast over batch dimension makes ops work much slower.
We are using the in the optimizer step for each layer https://github.com/tenstorrent/TT-Tron/blob/main/sources/ttml/op…