HIPS / autograd

Efficiently computes derivatives of NumPy code.
MIT License
6.99k stars 911 forks source link

Khatri product of matrices to Tensor #210

Closed nipunbatra closed 7 years ago

nipunbatra commented 7 years ago

Hi, I'm trying to decompose a Tensor (m, n, o) into matrices A(m, r), B (n, r) and C (k, r). This is known as PARAFAC decomposition. Tensorly already does this kind of a decomposition. I'm trying to use autograd for such a decomposition.

The process is simple:

  1. start with random A, B, C
  2. find the gradient wrt each and perform GD

For defining the cost function, we need to get a tensor (m, n, o) from A, B, C. This is known as Khatri product. Tensorly defines this as:

 from functools import reduce
 def kt_to_tensor(A, B, C):
     factors = [A, B, C]
     for r in range(factors[0].shape[1]):
        vecs = np.ix_(*[u[:, r] for u in factors])
        if r:
           res += reduce(np.multiply, vecs)
       else:
           res = reduce(np.multiply, vecs)
    return res

I can define my cost as:

def cost(A, B, C):
    pred = kt_to_tensor(A, B, C)
    error = (pred-t).flatten()
    return (error**2).mean()

However, computing the gradient (multigradient) over this would give the following error:

NotImplementedError: Gradient of ix_ not yet implemented.

I then tried to make the function definition simpler, as follows:

def new_kt_to_tensor(A, B, C):
    m, n, o = A.shape[0], B.shape[0], C.shape[0]
    out = np.zeros((m, n, o))
    k_max = A.shape[1]
    for alpha in range(0, m):
        for beta in range(0, n):
            for delta in range(0, o):
                for k in range(0, k_max):
                    out[alpha, beta, delta]=out[alpha, beta, delta]+ A[alpha, k]*B[beta, k]*C[delta, k]
    return out

However, gradient could not be computed over this too. I modified cost to use new_kt_to_tensor instead of kt_to_tensor. Error:

AutogradHint: This error *might* be caused by assigning into arrays, which autograd doesn't support.
Sub-exception:
ValueError: setting an array element with a sequence.

Ofcourse, I had checked that both these function definitions return the exact same result.

np.allclose(new_kt_to_tensor(A, B, C), kt_to_tensor(A, B, C))
True

I was wondering if you could let me know the best way to proceed with this usecase.

I have also asked this StackOverflow question to try and get a np.tensordot based solution for this multiplication.

mattjj commented 7 years ago

Thanks for the question.

new_kt_to_tensor doesn't work because, as the hint suggests, indexed assignment into arrays isn't supported in autograd. (kt_to_tensor probably also won't work because it uses in-place updating via +=, though it wasn't getting to that error.) See the tutorial for more information, particularly the "don't use" list.

I think the original code is using np.ix_ to implement broadcasting, and it's probably better to use more standard broadcasting constructs here, as the StackOverflow answer suggests, or to use np.einsum.

I'm going to close this issue because it seems that your problem is solved. A PR to implement np.ix_ support is welcome, though!