[BUG] Switching device from default.qubit to qiskit.aer makes workflow non-differentiable

Expected behavior

With default.qubit, cost function is successfully optimized with no warning. Output is as attached:

default_qubit_output

I would expect with the qiskit.aer device, a similar output.

Actual behavior

When the optimization loop runs with the qiskit.aer device, it gives the following warning:

*/[anaconda3\envs\qml_env\Lib\site-packages\autograd\tracer.py:14] (*/envs/qml_env/Lib/site-packages/autograd/tracer.py:14): UserWarning: Output seems independent of input. warnings.warn("Output seems independent of input.")

The optimization still runs and the cost function is actually lower than with default.qubit but that is likely illusory as something along the way becomes non-differentiable with the qiskit device.

Additional information

No response

Source code

# MWE

import pennylane as qml
from pennylane import numpy as pnp
import numpy as np

dev = qml.device('default.qubit')
n_qubits = 5
#dev = qml.device('qiskit.aer',wires=n_qubits)
n_layers = 5
n_datapoints = 5
steps = 2
stepsize = 0.01
opt = qml.AdamOptimizer(stepsize=stepsize)
n_total_parameters = 10

def get_data():
        """
        initializing the datapoints.
        """
        x = pnp.linspace(0, pnp.pi, n_datapoints)
        xnorm = x / pnp.linalg.norm(x)
        return xnorm, x

datapoints, x = get_data()

# Make the datapoints trainable but not x
datapoints.requires_grad=True
x.requires_grad=False

def Ansatz(params):
        new_params = pnp.reshape(params,(-1,n_qubits))
        for layer in new_params:
          qml.AngleEmbedding(features=layer, wires=range(n_qubits))

@qml.qnode(dev, max_diff=2) # diff_method='parameter-shift', 
def circuit(datapoints, params):
      qml.AngleEmbedding(features=datapoints, wires=range(n_qubits))
      Ansatz(params)
      prob = qml.probs(wires=range(n_qubits))
      return prob

init_params = pnp.random.rand(n_total_parameters, requires_grad=True)
qml.draw_mpl(circuit)(datapoints,init_params)
result = circuit(datapoints, init_params)
print('Circuit: ',result)

hess_f = qml.jacobian(qml.jacobian(circuit, argnum=0), argnum=0) # hessian function only w.r.t datapoints

def loss_fn(datapoints, params):

      d2_psi_dx2_vec = hess_f(datapoints,params) # Get the hessian at that point
      print('shape d2_psi_dx2_vec: ',pnp.shape(d2_psi_dx2_vec))
      sin = pnp.sin(x)
      sin_norm = sin / pnp.linalg.norm(sin)
      #pdeloss = d2_psi_dx2_vec[0, :, 0] - (-1 * sin_norm)
      pdeloss = datapoints - (-1 * sin_norm)
      loss = pnp.mean(pdeloss ** 2)

      return loss

def optimization(datapoints, params):
      best_params = params
      best_loss = np.inf
      for n in range(steps):
        [datapoints,params], loss = opt.step_and_cost(loss_fn, datapoints, params)
        print('params: ', params, "loss: ", loss)
        if loss < best_loss:
          best_loss = loss
          best_params = params
          best_loss_idx = n
      print("best loss:", best_loss) # Added this
      return best_params, best_loss

# It optimizes and runs without warnings or errors but very slow
best_params, best_loss = optimization(datapoints, init_params)

Tracebacks

No response

System information

Name: PennyLane
Version: 0.35.1
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: *\anaconda3\envs\qml_env\Lib\site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane-qiskit, PennyLane_Lightning

Platform info:           Windows-10-10.0.19045-SP0
Python version:          3.11.5
Numpy version:           1.26.1
Scipy version:           1.11.3
Installed devices:
- default.clifford (PennyLane-0.35.1)
- default.gaussian (PennyLane-0.35.1)
- default.mixed (PennyLane-0.35.1)
- default.qubit (PennyLane-0.35.1)
- default.qubit.autograd (PennyLane-0.35.1)
- default.qubit.jax (PennyLane-0.35.1)
- default.qubit.legacy (PennyLane-0.35.1)
- default.qubit.tf (PennyLane-0.35.1)
- default.qubit.torch (PennyLane-0.35.1)
...
- qiskit.ibmq (PennyLane-qiskit-0.36.0)
- qiskit.ibmq.circuit_runner (PennyLane-qiskit-0.36.0)
- qiskit.ibmq.sampler (PennyLane-qiskit-0.36.0)
- qiskit.remote (PennyLane-qiskit-0.36.0)

Existing GitHub issues

[X] I have searched existing GitHub issues to make sure the issue does not already exist.

Thanks for the report @DanielNino27 .

The differences you are seeing between the two devices comes down to the choice of diff method.

default.qubit defaults to backpropagation, which is infinitely differentiable out of the box. Any plugin will probably default to parameter-shift. You would probably see the same kind of issues if you manually specified diff_method="parameter-shift" in the qnode.

I also noticed you are trying to calculate a hessian.

To calculate a hessian with parameter shift, you will need to specify max_diff when constructing the qnode.

@qml.qnode(dev, diff_method="parameter-shift", max_diff=2)
def circuit(*args, **kwargs):

It looks like you specified max_diff=2 as a keyword argument to your quantum function. To clarify, max_diff must be provided when creating the qnode itself, not to the quantum function.

Hope that helps :)

Thanks for the suggestion, @albi3ro.

I tried the changes you suggested and the warning indeed goes away but the computation seems to get stuck with diff_method = 'parameter-shift with default.qubit. I haven't waited long enough to complete an iteration of optimization (I left it running for an hour and it still hadn't complete the first iteration of optimization, so it seems to take several orders of magnitude longer than with just backpropagation at least - if not just getting stuck somewhere along the way.

My understanding is that the difference shouldn't be so large between parameter-shift and backpropagation for this example - is that the case?

So I'm running the example with null qubit qml.device('null.qubit'), and that even seems to take a long time. I'll confirm more once I get numbers back, but yes, for this type of case there should be a huge difference between backprop and parameter shift.

First-order parameter shift produces two (or sometimes four+) executions per trainable parameter. If we have 10 parameters, that means 20 first-order gradient tapes.

When taking a second-order derivative, we have to calculate the derivative for each parameter for each gradient tape. That means 20 hessian tapes per gradient tape. We are now at 1 initial execution + 20 first order tapes + 400 hessian tapes. 421 total executions. Now caching does occur by default in pennylane with higher order derivatives, so some of those are indeed going to be duplicates.

So with caching, I think we bring that down to (1 + 2 N + N+1 + 4N*(N-1)) = 392.

So be wary.

Also, it looks like your loss function is the hessian. So you should actually be calculating third-order derivatives if you want to use gradient-based optimization. Which would then be 8,000 tapes... but caching will probably play a much larger role at that point.

I'll provide more information when I get it, but I believe this is the source of your problem.

And confirmed.

For:

dev = qml.device('null.qubit')
n_qubits = 5
#dev = qml.device('qiskit.aer',wires=n_qubits)
n_layers = 5
n_datapoints = 5
steps = 1
stepsize = 0.01
opt = qml.AdamOptimizer(stepsize=stepsize)
n_total_parameters = 10

We had 466 executions occur.

PennyLaneAI / pennylane