Open DanielNino27 opened 1 week ago
Thanks for the report @DanielNino27 .
The differences you are seeing between the two devices comes down to the choice of diff method.
default.qubit
defaults to backpropagation, which is infinitely differentiable out of the box. Any plugin will probably default to parameter-shift
. You would probably see the same kind of issues if you manually specified diff_method="parameter-shift"
in the qnode.
I also noticed you are trying to calculate a hessian.
To calculate a hessian with parameter shift, you will need to specify max_diff
when constructing the qnode.
@qml.qnode(dev, diff_method="parameter-shift", max_diff=2)
def circuit(*args, **kwargs):
It looks like you specified max_diff=2
as a keyword argument to your quantum function. To clarify, max_diff
must be provided when creating the qnode itself, not to the quantum function.
Hope that helps :)
Thanks for the suggestion, @albi3ro.
I tried the changes you suggested and the warning indeed goes away but the computation seems to get stuck with diff_method = 'parameter-shift with default.qubit. I haven't waited long enough to complete an iteration of optimization (I left it running for an hour and it still hadn't complete the first iteration of optimization, so it seems to take several orders of magnitude longer than with just backpropagation at least - if not just getting stuck somewhere along the way.
My understanding is that the difference shouldn't be so large between parameter-shift and backpropagation for this example - is that the case?
So I'm running the example with null qubit qml.device('null.qubit')
, and that even seems to take a long time. I'll confirm more once I get numbers back, but yes, for this type of case there should be a huge difference between backprop and parameter shift.
First-order parameter shift produces two (or sometimes four+) executions per trainable parameter. If we have 10 parameters, that means 20 first-order gradient tapes.
When taking a second-order derivative, we have to calculate the derivative for each parameter for each gradient tape. That means 20 hessian tapes per gradient tape. We are now at 1 initial execution + 20 first order tapes + 400 hessian tapes. 421 total executions. Now caching does occur by default in pennylane with higher order derivatives, so some of those are indeed going to be duplicates.
So with caching, I think we bring that down to (1 + 2 N + N+1 + 4N*(N-1)) = 392.
So be wary.
Also, it looks like your loss function is the hessian. So you should actually be calculating third-order derivatives if you want to use gradient-based optimization. Which would then be 8,000 tapes... but caching will probably play a much larger role at that point.
I'll provide more information when I get it, but I believe this is the source of your problem.
And confirmed.
For:
dev = qml.device('null.qubit')
n_qubits = 5
#dev = qml.device('qiskit.aer',wires=n_qubits)
n_layers = 5
n_datapoints = 5
steps = 1
stepsize = 0.01
opt = qml.AdamOptimizer(stepsize=stepsize)
n_total_parameters = 10
We had 466 executions occur.
Expected behavior
With default.qubit, cost function is successfully optimized with no warning. Output is as attached:
I would expect with the qiskit.aer device, a similar output.
Actual behavior
When the optimization loop runs with the qiskit.aer device, it gives the following warning:
*/[anaconda3\envs\qml_env\Lib\site-packages\autograd\tracer.py:14] (*/envs/qml_env/Lib/site-packages/autograd/tracer.py:14): UserWarning: Output seems independent of input. warnings.warn("Output seems independent of input.")
The optimization still runs and the cost function is actually lower than with default.qubit but that is likely illusory as something along the way becomes non-differentiable with the qiskit device.
Additional information
No response
Source code
Tracebacks
No response
System information
Existing GitHub issues