Efficient jacobians per residual

HIPS / autograd

Efficiently computes derivatives of NumPy code.

MIT License

6.92k stars 905 forks source link

Efficient jacobians per residual #544

Open tasptz opened 4 years ago

tasptz commented 4 years ago

How can I get a jacobian per residual? E.g. n x m input data, n output data (each computed from an m vector). Therefore n Jacobians where each has shape 1 x m, so n x m in total.

Autograd builds of course a (n) x (n x m) jacobian (sparse) which becomes impracticable if n is large
numpy.apply_along_axis wouble probably be the best option currently?

Is there a way to do this with current autograd module?

momchilmm commented 4 years ago

Don't know if there's a better way, but you could just stitch the jacobian vector using grad. That is to say, define n objective functions that each take a vector and return a scalar, and then your jacobian is something like np.stack([grad[i](vec[i, :]) for i in range(vec.shape[0])]).

It's not very elegant but I don't know of any other way to tell autograd that you have a "sparse" dependence.

tasptz commented 4 years ago

To go into a bit more detail: In a lot of optimization problems you calculate every residual with the same function but different data. Naturally in numpy you batch in the first dimensions and work mostly with matrix multiplications, broadcasting and so on, avoiding loops. So for your line (I'll write jac instead of grad and simplify it) it would be the same for every datapoint.

np.stack([jac(v) for v in vec])

or what I suggested

np.apply_along_axis(jac, 0, vec)

However with large n this is slow (I guess function call overhead). It would be perfect if autograd could apply jac to batches relying on numpy's broadcasting, matrix multiplication, etc.

momchilmm commented 4 years ago

I see, yeah, I don't know how this can be done more efficiently. I think elementwise_grad might actually also do what you want, but (if it does work) it will first compute the n x n x m Jacobian and then sum up the elements to return the n x m "diagonal" Jacobian. So it will still be inefficient for large n. There's also this other issue that I found with a solution proposed in the end, but again I'm not sure it will avoid the overhead in your case.

tasptz commented 4 years ago

The other issue looks like a possible way to go, but the example doesn't work for me.


  File "tmp.py", line 18, in <module>
    E_field_grid = -grad(potential)(xy_grid)  # <--- evaluating the electric field
  File "/home/t/miniconda3/envs/x/lib/python3.6/site-packages/autograd/wrap_util.py", line 20, in nary_f
    return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
  File "/home/t/miniconda3/envs/x/lib/python3.6/site-packages/autograd/differential_operators.py", line 27, in grad
    raise TypeError("Grad only applies to real scalar-output functions. "
TypeError: Grad only applies to real scalar-output functions. Try jacobian, elementwise_grad or holomorphic_grad.```