Open josh146 opened 2 years ago
As a corollary to this issue: can we also tweak mode="forward"
such that the gradient computation is triggered on the forward pass only if backpropagation is occurring?
This could mean ignoring the call to device.jacobian
if there are no ArrayBox
objects present on the tape when using Autograd, for example.
The solution described here sounds very good to me! Thanks!
Any chance this can be finished an merged soon? The current behavior has again lead to a problem that was very hard to debug...
Hey @cvjjm! Let me check in with the internal team to get an ETA for you
Apologies of the delay @cvjjm, I'm still following this one up. This is something we are working to address in the new device API, but I'm not sure of the status of the existing one. In the meantime, do you have any more details of the new bug you came across? To help us ensure that any solution we implement also covers this edge case :)
Just a little context on what is involved on solving this problem:
This problem has two pieces:
1) What the device defines. Does the device even define a method that computes the gradient on execution? Does it define a method that computes the gradient independently? How about vjps and jvps? Higher orders?
We have been working on designing, prototyping, and implementing a new device interface for quite some time. We focused on allowing the device to compute a bunch of different things and allowing the device to specify which things it can compute. We are investigating a change to the interface to allow devices to fill in what is "best" for an execution.
2) How PennyLane uses the device. This is a more complicated problem we are now exploring. As this part of the code has a lot more interactions and influences, its harder to design and change, especially without incurring large amounts of technical debt.
The source of the confusion about taking the gradient on execution is actually in this part of the problem. The "workflow" code assumes that all device derivatives should be computed with the execution.
Hopefully this clarifies the number of moving parts that need to be redesigned to solve this problem.
In the meantime, I recommend specifying mode="backward"
in the QNode to manually control when gradients occur.
Thanks for the summary. I appreciate the complexity of the problem :-) Just wanted to raise awareness again that having to remember to specify mode="backward"
is required to prevent PL from doing something completely counter intuitive (like compute a quite expensive gradient when all the user asked for was a simple energy) is not a very scalable approach - sooner or later some people will forget and then spend a week hunting a hard to debug problem...
Yep for sure! This is something that is a bit harder to change without breaking other things, due to historic decisions/assumptions within the PL codebase. To keep you in the loop, @albi3ro is prototyping potential solutions in #3980
Feature details
Currently, PennyLane supports two modes for accumulating gradients of QNodes:
mode="backward"
(request the Jacobian of the QNode on the backward pass).This is the best method for gradient methods such as the parameter-shift rule, where many more circuits are required for gradient computations compared to the forward pass.
mode="forward"
(request the Jacobian of the QNode on the forward pass).This is the best method for gradient methods where computing the gradient is more efficient if the information from the forward pass is re-used. This includes:
adjoint
method (where the statevector is required), andCurrently, this setting can be toggled within the QNode:
By default:
diff_method="parameter_shift"
results inmode="backward"
diff_method="device"
ordiff_method="adjoint"
results inmode="forward"
.However, there may be devices where the gradient computation is not more efficient on the forward pass.
Therefore, it would be preferable if the device could 'declare' which mode it prefers, which is then used by the QNode if not specified by the user.
Implementation
No response
How important would you say this feature is?
2: Somewhat important. Needed this quarter.
Additional information
No response