cvxgrp / cvxpylayers

Differentiable convex optimization layers
Apache License 2.0
1.82k stars 162 forks source link

sparsemax function is not translation invariant? #132

Closed kbkartik closed 2 years ago

kbkartik commented 2 years ago

As per the original sparsemax paper, the activation function should be translation invariant. However, when I implemented the function using the library, I see numerical differences. Following is my code:

import torch
import numpy as np
import cvxpy as cp
from cvxpylayers.torch import CvxpyLayer

n = 4
x = cp.Parameter(n)
y = cp.Variable(n)
obj = cp.sum_squares(x-y)
cons = [cp.sum(y) == 1, 0. <= y, y <= 1.]
prob = cp.Problem(cp.Minimize(obj), cons)
layer = CvxpyLayer(prob, [x], [y])

for i in [496, 375, 452, 485, 982, 420, 510, 833, 878, 434, 147, 551, 537, 884, 913,  86,  11, 342, 577, 198]:
#for i in list(torch.randint(1000, size=(20,))):
    torch.manual_seed(i)
    torch.random.manual_seed(i)
    X = torch.randn(2, 4, dtype=torch.float, requires_grad=False)
    Z = torch.randn(2, 4, dtype=torch.float, requires_grad=False) + 2 # translation
    max_val1, _ = X.max(dim=1, keepdim=True)
    max_val2, _ = Z.max(dim=1, keepdim=True)
    with torch.no_grad():
        X -= max_val1 # removing numerical instability
        Z -= max_val2
    out1, = layer(X)
    out2, = layer(Z)
    print((out1-out2).sum(dim=1))

Below is my output:

tensor([-8.9407e-08,  0.0000e+00])
tensor([-7.4506e-08, -1.1921e-07])
tensor([ 2.0862e-07, -5.9605e-08])
tensor([5.9605e-08, 3.1304e-07])
tensor([-1.1921e-07, -5.9605e-08])
tensor([-1.0210e-07, -1.1676e-07])
tensor([-5.9605e-08,  8.9407e-07])
tensor([-5.9605e-08, -6.5565e-07])
tensor([4.9331e-09, 3.2783e-07])
tensor([-4.1573e-07,  1.7157e-07])
tensor([ 5.9605e-08, -8.9407e-08])
tensor([2.2352e-08, 1.7881e-07])
tensor([5.9605e-08, 1.4901e-07])
tensor([4.7684e-07, 0.0000e+00])
tensor([-5.9605e-08,  0.0000e+00])
tensor([ 3.5052e-09, -1.8448e-07])
tensor([-1.7233e-07,  2.2352e-07])
tensor([ 9.6197e-08, -5.7774e-08])
tensor([ 7.3373e-09, -3.5763e-07])
tensor([-2.9802e-08, -1.1921e-07])
bamos commented 2 years ago

Are you referring to Prop 2.2 on the invariance from adding a constant to the coordinates?

image

If so, Z in your code does not just add a constant to X, but instead samples a new vector. The CVXPY implementation is invariant if we set Z to be a constant added to X, for example like this:

    X = torch.randn(2, 4, dtype=torch.float, requires_grad=False)
    Z = X + 2 # translation
    out1, = layer(X)
    out2, = layer(Z)
    print('===')
    print(torch.stack((out1, out2)))

Output

(These smaller differences are numerically close to zero)

===
tensor([[[-2.7666e-11,  8.3062e-01,  3.0246e-11,  1.6938e-01],
         [ 7.4694e-01,  5.9583e-08,  9.5113e-02,  1.5795e-01]],

        [[ 6.3620e-09,  8.3062e-01, -8.4918e-10,  1.6938e-01],
         [ 7.4694e-01,  2.1262e-09,  9.5113e-02,  1.5795e-01]]])
===
tensor([[[ 1.3046e-01,  2.9990e-11,  8.6954e-01, -4.4236e-11],
         [-3.6677e-06,  9.1214e-01,  8.7866e-02, -6.2878e-07]],

        [[ 1.3046e-01, -1.0309e-08,  8.6954e-01, -8.8253e-08],
         [ 1.8545e-07,  9.1214e-01,  8.7864e-02, -6.7285e-07]]])
kbkartik commented 2 years ago

My bad, mistake from my side. Thanks for your quick input!