cvxgrp / cvxpylayers

Differentiable convex optimization layers
Apache License 2.0
1.82k stars 162 forks source link

None Gradients for Variables modified before fed to cvxpylayer #92

Open marsolo93 opened 3 years ago

marsolo93 commented 3 years ago

Hello!

At first thank you for the opportunity to handle convex optimization problems via backpropagation in Tensorflow and Pytorch. It is a nice opportunity for combining machine and deep learning.

For my first tests with Cvxpylayers I took the Tensorflow example and modified it a bit. I constructed a neural network sequentially before the cvypylayer in forward pass (in the backward pass it is sequentially after the cvxpylayer). Unfortunately, I got the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute MatMul as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:MatMul]

So, somehow the backpropagation via the cvxpylayer outputs a double tensor (tf.float64) which is not matching with the MatMul operation, processing only float tensors (tf.float32).

I simplified the problem in the following code:


import cvxpy as cp
import tensorflow as tf
from cvxpylayers.tensorflow import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
A_tf = tf.Variable(tf.random.normal((m, n)))
b_tf = tf.Variable(tf.random.normal((m,)))

with tf.GradientTape() as tape:
      # solve the problem, setting the values of A, b to A_tf, b_tf
      b_iden = tf.squeeze(tf.matmul(tf.eye(3, 3), tf.expand_dims(b_tf, axis=1))) 
      # b_iden is the dot product between the b_tf vector [3,] and the identity matrix [3, 3]

      solution, = cvxpylayer(A_tf, tf.cast(b_iden, dtype=tf.float32))
      summed_solution = tf.math.reduce_sum(solution)
    # compute the gradient of the summed solution with respect to A, b
gradA, gradb = tape.gradient(summed_solution, [A_tf, b_tf])
print(gradA)
print(gradb)

In that code I want to calculate the gradients for A_tf and b_tf. In contrast to your example, b_tf is not directly fed to the cvxpylayer, but was "modified" by the dot product with a identity matrix of matching dimensions. So, b_iden should be equal to b_tf, but it is different for the backpropagation algorithm, since it needs to calculate the gradients via the MatMul Operation, which is not possible, because of the error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute MatMul as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:MatMul]

Is there some workaround, since neural networks after the cvxpylayers is not a problem, like you demonstrated by the ReLU example. There you do not need to feed a MatMul Operation with tf.float64.

Thank you for your Respond! Marcel

marsolo93 commented 3 years ago

EDIT: It is easily achievable with Pytorch.

sbarratt commented 3 years ago

I'll defer to @akshayka, who is the resident Tensorflow expert.