[bug] Declaring provides_jacobian=True capability breaks device with current master

cvjjm commented 2 years ago

I have observed (with great delight!) that tape mode now allows observables returning non-standard types beyond floats, e.g., dicts or classes.

I am also pleasantly surprised, this was not something we intended/expected when coding it 😆 Glad to know it works!

We should probably add tests for this behaviour, to make sure we don't accidentally break it.

Originally posted by @josh146 in https://github.com/PennyLaneAI/pennylane/issues/1109#issuecomment-785616046

It seems that this "feature" is now broken in the latest master.

More precisely the line here https://github.com/PennyLaneAI/pennylane/blob/1a13bef8adfa85a86a7567f41294b3f73285a294/pennylane/_qubit_device.py#L299 makes it impossible to have an observable that returns something that is not castable to float.

Would it be possible to have this line in a try/catch block and return either the plain result or a dtype=object array in case result cannot be cast to float?

josh146 commented 2 years ago

Thanks @cvjjm for catching this! Actually, I wonder why our test added in https://github.com/PennyLaneAI/pennylane/pull/1291 didn't catch this.

@chaeyeunpark I'm guessing this was introduced in #2448?

cvjjm commented 2 years ago

That is indeed surprising that the test from #1291 doesn't fail. I must admit that I didn't have time to check this carefully. Just noticed that some of my code does no longer work with master and that commenting this line fixes things.

josh146 commented 2 years ago

I went hunting for the test; since we recently removed old unused tape subclasses, it seems that this test was moved from test_jacobian_tape.py to test_parameter_shift.py:

https://github.com/PennyLaneAI/pennylane/blob/1a13bef8adfa85a86a7567f41294b3f73285a294/tests/gradients/test_parameter_shift.py#L1032-L1034

cvjjm commented 2 years ago

Thanks @josh146 I didn't have time to look into this more but can report that differentiating such non standard observables is also broken in the current master:

  File "/.../pennylane/pennylane/_grad.py", line 328, in _jacobian_function
    jac = tuple(_jacobian(func, arg)(*args, **kwargs) for arg in _argnum)
  File "/.../pennylane/pennylane/_grad.py", line 328, in <genexpr>
    jac = tuple(_jacobian(func, arg)(*args, **kwargs) for arg in _argnum)
  File "/.../autograd/autograd/wrap_util.py", line 20, in nary_f
    return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
  File "/.../autograd/autograd/differential_operators.py", line 61, in jacobian
    return np.reshape(np.stack(grads), jacobian_shape)
  File "/.../autograd/autograd/numpy/numpy_wrapper.py", line 88, in stack
    arrays = [array(arr) for arr in arrays]
  File "/.../autograd/autograd/numpy/numpy_wrapper.py", line 88, in <listcomp>
    arrays = [array(arr) for arr in arrays]
  File "/.../autograd/autograd/core.py", line 14, in vjp
    def vjp(g): return backward_pass(g, end_node)
  File "/.../autograd/autograd/core.py", line 22, in backward_pass
    for parent, ingrad in zip(node.parents, ingrads):
  File "/.../autograd/autograd/core.py", line 49, in <genexpr>
    return lambda g: (vjp(g) for vjp in vjps)
  File "/.../autograd/autograd/builtins.py", line 82, in <lambda>
    defvjp_argnum(make_sequence, lambda argnum, *args: lambda g: g[argnum - 1])
IndexError: list index out of range

Both were still working with v0.22.2.

cvjjm commented 2 years ago

The root cause of this turned out to be something entirely different :-) The device with which I discovered the problem also provides a device gradient and this is what is broken now. Surprisingly, just declaring that a device provides a gradient now makes it unusable, even for evaluating, say, paramerter shift gradients of QNodes as this minimal example shows (notice how I explicitely ask for diff_method='parameter-shift', mode="backward" and indeed the device .jacobian() is never called):

import pennylane as qml
from pennylane.devices import DefaultQubit
from pennylane import numpy as np

class MyQubit(DefaultQubit):
    @classmethod
    def capabilities(cls):
        capabilities = super().capabilities().copy()
        capabilities.update(
            provides_jacobian=True, # with this commented out everything works
        )
        return capabilities

    def jacobian(self, *args, **kwargs):
        raise NotImplementedError()

dev = MyQubit(wires=2)

@qml.qnode(dev, diff_method='parameter-shift', mode="backward")
def qnode(params):
    qml.RY(params[0], wires=[0])
    return qml.expval(qml.PauliZ(0))

params = np.array([0.2])

print(qnode(params))
print(qml.jacobian(qnode)(params))

which results in:

0.9800665778412417
asefaefafeawef [0] <QNode: wires=2, device='default.qubit', interface='autograd', diff_method='parameter-shift'> (tensor([0.2], requires_grad=True),) {}
Traceback (most recent call last):
  File "pl_test9.py", line 27, in <module>
    print(qml.jacobian(qnode)(params))
  File "/.../pennylane/pennylane/_grad.py", line 329, in _jacobian_function
    jac = tuple(_jacobian(func, arg)(*args, **kwargs) for arg in _argnum)
  File "/.../pennylane/pennylane/_grad.py", line 329, in <genexpr>
    jac = tuple(_jacobian(func, arg)(*args, **kwargs) for arg in _argnum)
  File "/.../autograd/autograd/wrap_util.py", line 20, in nary_f
    return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
  File "/.../autograd/autograd/differential_operators.py", line 61, in jacobian
    return np.reshape(np.stack(grads), jacobian_shape)
  File "/.../autograd/autograd/numpy/numpy_wrapper.py", line 88, in stack
    arrays = [array(arr) for arr in arrays]
  File "/.../autograd/autograd/numpy/numpy_wrapper.py", line 88, in <listcomp>
    arrays = [array(arr) for arr in arrays]
  File "/.../autograd/autograd/core.py", line 14, in vjp
    def vjp(g): return backward_pass(g, end_node)
  File "/.../autograd/autograd/core.py", line 23, in backward_pass
    outgrads[parent] = add_outgrads(outgrads.get(parent), ingrad)
  File "/.../autograd/autograd/core.py", line 176, in add_outgrads
    return sparse_add(vspace(g), None, g), True
  File "/.../autograd/autograd/tracer.py", line 48, in f_wrapped
    return f_raw(*args, **kwargs)
  File "/.../autograd/autograd/core.py", line 186, in sparse_add
    return x_new.mut_add(x_prev)
  File "/.../autograd/autograd/numpy/numpy_vjps.py", line 698, in mut_add
    onp.add.at(A, idx, x)
ValueError: array is not broadcastable to correct shape

josh146 commented 2 years ago

Thanks @cvjjm, this is helpful, but also troubling 😬

  File "/.../autograd/autograd/numpy/numpy_vjps.py", line 698, in mut_add
   onp.add.at(A, idx, x)
ValueError: array is not broadcastable to correct shape

I am very familiar with this traceback, and it always scares me since it is very hard to debug; it's an indication that a broadcasting rule we are using --- that is permitted on the forward pass --- is breaking on the autograd backwards pass 😬 And the traceback doesn't indicate where in the forward pass the issue is!

cvjjm commented 2 years ago

Luckily it is still working with a fairly recent version, so should be possible to find the problem by bisecting.

Maybe the above example can become a test to catch such problems in the future.

josh146 commented 2 years ago

@mlxd tracked this down to https://github.com/PennyLaneAI/pennylane/commit/6d1ddf664531507fa35973c0ebb3d709ada7afc5 (bug fixes we made during the last release-candidate cycle!)

antalszava commented 2 years ago

Hi @cvjjm, this seems to boil down to the change we chatted about around the release of v0.23.0. It seems like the logic added there did start causing issues after all :crying_cat_face:

In specific, a new if statement was added here:

        if device.capabilities().get("provides_jacobian", False):
            # in the case where the device provides the jacobian,
            # the output of grad_fn must be wrapped in a tuple in
            # order to match the input parameters to _execute.
            return (return_vjps,)
        return return_vjps

This was required to help with a device we have in the PennyLane-SF plugin. It seems, however, that as you suggest, it breaks other cases.

Removing the if statement makes the listed example to execute well.

antalszava commented 2 years ago

We could definitely make a fix to this. I'm curious: how severe would this bug be in its current form? I.e., would it bring value to make a bug fix release such that it's pip installable, or would having it with v0.24.0 be a good target (~21th of June)?

cvjjm commented 2 years ago

Oh! I guess my take home form the above is then:

1) Checking whether the device declares provides_jacobian=True is not a good indicator for whether the jacobian actually came from the device. I guess the wrapping in a tuple should at least be moved to inside this else statement: https://github.com/PennyLaneAI/pennylane/blob/bcd837b1ed1187895c327abfe62aea71fbeba02f/pennylane/interfaces/autograd.py#L220-L232

2) Is there a more robust way of checking whether the jacobian actually came from a device and thus needs to be wrapped? Maybe the shape or type of the jacobian can be used to decide whether wrapping is needed? This would even allow some room for error on the plugin developer side, which would be nice.

3) There needs to be a test in core PL with at least a "fake" device that implements a "fake" device jacobian feature (like returning a fixed number) so that such things can be caught. Currently there is no device that ships with PL that implements the device gradient, isn't there?

I would highly appreciate a 0.23.X bugfix release, not because I cannot wait until the 21st but because it is nice having at least some v0.23.X version that I do not need to black list because of this :-)

cvjjm commented 2 years ago

Just as a reminder: The original issue from https://github.com/PennyLaneAI/pennylane/issues/2576#issue-1237311914 is also still open.

Thanks @josh146 for hunting down the test that was meant to preserve the feature of non standard return types!

Looking at the test, it is fairly clear that the reason for why this was not triggered is that someone simply re-declared the R_DTYPE of the device to be SpecialObsevable: https://github.com/PennyLaneAI/pennylane/blob/1a13bef8adfa85a86a7567f41294b3f73285a294/tests/gradients/test_parameter_shift.py#L1072-L1074 This makes the test pass but then means that the device can only ever compute QNodes that return a SpecialObsevable.

What I need though is a device that supports standard observables and my special observable. Probably the test can/should be expanded to also have the device evaluate a QNode returning a normal observable.

antalszava commented 2 years ago

Hi @cvjjm, thanks for the details! :slightly_smiling_face: We're going forward with the fix for the device jacobian (see https://github.com/PennyLaneAI/pennylane/pull/2595). After chatting internally, we'd target v0.24.0 to have it in. Will also consider the other issues here.

cvjjm commented 2 years ago

Hi, as far as I can tell the test is still in the "vandalized" state in the current master. Is there a chance we can get this functionality restored in the next release?

antalszava commented 2 years ago

Hi @cvjjm, apologies that the progress here has slowed down on the original issue, we can definitely have this in for the next release (v0.25.0). Thanks for the reminder! :+1:

puzzleshark commented 2 years ago

Hi @cvjjm, I created a pr implementing the custom object fix, plus re-adding a testcase that seems to have disappeared from pl . But this kind of solution I don't think would work in the case of jax, torch, tf, since in general we register a custom primitive with each of these frameworks (representing the circuit) that must return a tensor. The only solution I can think is some packing and unpacking mechanism, similar to jax pytrees.

cvjjm commented 2 years ago

Thanks! That fix looks perfect and, yes, I agree that this will only work with autograd/numpy, but that is fine. There is an additional change in master that also breaks special return types and which will show up when you merge master into the fix branch. See my comment over there

PennyLaneAI / pennylane

[bug] Declaring provides_jacobian=True capability breaks device with current master #2576