Allow devices to declare which gradient mode they prefer, `forward` or `backward`.

josh146 commented 2 years ago

Feature details

Currently, PennyLane supports two modes for accumulating gradients of QNodes:

mode="backward" (request the Jacobian of the QNode on the backward pass).

This is the best method for gradient methods such as the parameter-shift rule, where many more circuits are required for gradient computations compared to the forward pass.
mode="forward" (request the Jacobian of the QNode on the forward pass).

This is the best method for gradient methods where computing the gradient is more efficient if the information from the forward pass is re-used. This includes:
- the adjoint method (where the statevector is required), and
- remote device APIs, where it there is less overhead to submit a single job requesting both QNode value and gradient at once.

Currently, this setting can be toggled within the QNode:

@qml.qnode(dev, mode="backward")

mode (str) – Whether the gradients should be computed on the forward pass ("forward")
or the backward pass ("backward"). Only applies if the device is queried for the
gradient; gradient transform functions available in qml.gradients are only supported
on the backward pass.

By default:

diff_method="parameter_shift" results in mode="backward"
diff_method="device" or diff_method="adjoint" results in mode="forward".

However, there may be devices where the gradient computation is not more efficient on the forward pass.

Therefore, it would be preferable if the device could 'declare' which mode it prefers, which is then used by the QNode if not specified by the user.

Implementation

No response

How important would you say this feature is?

2: Somewhat important. Needed this quarter.

Additional information

No response

josh146 commented 2 years ago

As a corollary to this issue: can we also tweak mode="forward" such that the gradient computation is triggered on the forward pass only if backpropagation is occurring?

This could mean ignoring the call to device.jacobian if there are no ArrayBox objects present on the tape when using Autograd, for example.

cvjjm commented 2 years ago

The solution described here sounds very good to me! Thanks!

cvjjm commented 1 year ago

Any chance this can be finished an merged soon? The current behavior has again lead to a problem that was very hard to debug...

josh146 commented 1 year ago

Hey @cvjjm! Let me check in with the internal team to get an ETA for you

josh146 commented 1 year ago

Apologies of the delay @cvjjm, I'm still following this one up. This is something we are working to address in the new device API, but I'm not sure of the status of the existing one. In the meantime, do you have any more details of the new bug you came across? To help us ensure that any solution we implement also covers this edge case :)

albi3ro commented 1 year ago

Just a little context on what is involved on solving this problem:

This problem has two pieces:

1) What the device defines. Does the device even define a method that computes the gradient on execution? Does it define a method that computes the gradient independently? How about vjps and jvps? Higher orders?

We have been working on designing, prototyping, and implementing a new device interface for quite some time. We focused on allowing the device to compute a bunch of different things and allowing the device to specify which things it can compute. We are investigating a change to the interface to allow devices to fill in what is "best" for an execution.

2) How PennyLane uses the device. This is a more complicated problem we are now exploring. As this part of the code has a lot more interactions and influences, its harder to design and change, especially without incurring large amounts of technical debt.

The source of the confusion about taking the gradient on execution is actually in this part of the problem. The "workflow" code assumes that all device derivatives should be computed with the execution.

Hopefully this clarifies the number of moving parts that need to be redesigned to solve this problem.

In the meantime, I recommend specifying mode="backward" in the QNode to manually control when gradients occur.

cvjjm commented 1 year ago

Thanks for the summary. I appreciate the complexity of the problem :-) Just wanted to raise awareness again that having to remember to specify mode="backward" is required to prevent PL from doing something completely counter intuitive (like compute a quite expensive gradient when all the user asked for was a simple energy) is not a very scalable approach - sooner or later some people will forget and then spend a week hunting a hard to debug problem...

josh146 commented 1 year ago

Yep for sure! This is something that is a bit harder to change without breaking other things, due to historic decisions/assumptions within the PL codebase. To keep you in the loop, @albi3ro is prototyping potential solutions in #3980

PennyLaneAI / pennylane