-
Hi, first of all, thank you very much for sharing the code for AdaBelief, it looks like a very promising optimizer! :) Have you considered comparing it to [AdaHessian](https://arxiv.org/abs/2006.00719…
-
Hi
I recently came across this paper on an improved accuracy Hutchinson method, but I am not well versed enough in the discipline to know if it can be used with AdaHessian. Do you think it can be us…
-
## 🚀 Feature
Implement the AdaHessian Optimizer in torch.optim. The optimizer was proposed in the Paper [ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning](https://arxiv.org/abs/200…
-
Hi
I recently started using the version of AdaHessian from https://github.com/jettify/pytorch-optimizer in the facebookresearch parlai system to see how it works for training chatbots. I am not ver…
-
It seems that the Quasi-Newton methods have second-order convergence. But in the loss figures you have shown in the README.md, it behaves like a first-order optimizer (It performs better than SGD, but…
-
For example, how to use (if possible) the [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer) library?
-
The Adahessian optimizer example code is missing a comma after the betas.
-
https://github.com/amirgholami/adahessian/blob/bacccecc7a078c3e9e72aa55b17d8e46d21dc9c9/adahessian_tf/adahessian.py#L384
Do not use numpy functions within a tf.function decorator. Use tensorflow im…
-
Hi, congrats on the nice work. But I have a problem in achieving your claimed result 35.85 in the paper. My run on my machine with cuda 9.0 and PyTorch 1.1 of Adabelief is 35.69.
![image](https://use…
-
Hi,
I've tried using adahessian as a drop-in replacement for adadelta in the PyTorch mnist example (with loss.backward(create_graph=True)), but this produces the error:
NameError: name 'gradsH' …