amirgholami / adahessian

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
MIT License
266 stars 49 forks source link

Help using adahessian in TensorFlow #16

Open Cyberface opened 3 years ago

Cyberface commented 3 years ago

Hi, I'm trying to use adahessian in TensorFlow for a simple regression experiment but having trouble.

I have a simple example in this google colab notebook: https://colab.research.google.com/drive/1EbKZ0YHhyu6g8chFlJD74dzWrbo82mbV?usp=sharing

I am getting the following error

ValueError: Variable <tf.Variable 'dense_12/kernel:0' shape=(1, 100) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

In the notebook I first write a little training loop that works with standard optimisers such as Adam. See "example training with Adam"

Then in the next section "example training with Adahessian" I basically copy the previous code and make a few modifications to try and get Adahessian to work.

Specifically, I only changed

from

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

to

optimizer = AdaHessian(learning_rate=0.01)

and from

grads = tape.gradient(current_loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))

to

grads, Hessian = optimizer.get_gradients_hessian(current_loss, model.trainable_weights)
optimizer.apply_gradients_hessian(zip(grads, Hessian, model.trainable_weights))

Can anyone see what I'm doing wrong? Thanks!

KimiHsieh commented 3 years ago

I have the same issue

KimiHsieh commented 3 years ago

Environment: adahessian_tf/environment.yml

I think the issue is caused by grads = gradients.gradients(loss, params) in get_gradients_hessian(self, loss, params) if you check the return of grads = gradients.gradients(loss, params), it will be None. But I don't know how to fix this issue.

https://github.com/amirgholami/adahessian/blob/935a0476aeb8f76b397d9ef4f04d59d7783abfec/adahessian_tf/cifar_training_tools.py#L56-L71

KimiHsieh commented 3 years ago

Environment: adahessian_tf/environment.yml

I think the issue is caused by grads = gradients.gradients(loss, params) in get_gradients_hessian(self, loss, params) if you check the return of grads = gradients.gradients(loss, params), it will be None. But I don't know how to fix this issue.

https://github.com/amirgholami/adahessian/blob/935a0476aeb8f76b397d9ef4f04d59d7783abfec/adahessian_tf/cifar_training_tools.py#L56-L71

https://github.com/amirgholami/adahessian/blob/935a0476aeb8f76b397d9ef4f04d59d7783abfec/adahessian_tf/adahessian.py#L160-L198

lpupp commented 3 months ago

i have the same issue. has this been solved?

lpupp commented 3 months ago

In the original post:

I have a simple example in this google colab notebook: https://colab.research.google.com/drive/1EbKZ0YHhyu6g8chFlJD74dzWrbo82mbV?usp=sharing

I am getting the following error ...

wrapping the train function in a @tf.function decorator solves it for me.

tf.gradients is only valid in a graph context (see official docs), which is I guess what was missing.