PennyLaneAI / pennylane

PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
https://pennylane.ai
Apache License 2.0
2.32k stars 595 forks source link

[BUG] Shape mismatch occurs in gradient calculation when using an indexing operation with batched inputs and autograd #5979

Closed majafranz closed 2 months ago

majafranz commented 3 months ago

Expected behavior

When having batched inputs for a QNode (in the example script below: input of shape (5, 1)), I want to calculate the probabilities of the outcomes of the circuit and then use the first probability of outcome "0" for each input. That is, I obtain a output of shape (5, 2) from the circuit, where the first dimension (5) represents the batch dimension, and the second dimension (2) represents the respective probabilities for either "0", or "1".

The probability for all "0"-es, can be obtained by indexing, i.e. when the outcomes of the QNode are prediction (shape (5, 2)), the probabilities for "0"-es would be prediction[:,0] (shape (5,)).

The "forward" pass works as expected. One would also expect to obtain gradients, wrt. the parameters, when e.g. calling qml.AdamOptimizer().step_and_cost.

Actual behavior

Computing the gradients fails for batchsizes > 1 with the Traceback below.

Additional information

In the non-working example script, either setting the BATCH_SIZE = 1, or using the torch interface with a torch optimiser, e.g. as in the following code sample, does provide the expected behaviour. Therefore, I assume the bug might be related to either the gradient calculation with autograd, or the qml.AdamOptimizer used in the example.

## Working example with torch
import pennylane as qml
import torch

BATCH_SIZE = 5

def _circuit(w, x):
    qml.RX(w[0], wires=0)
    qml.RZ(w[1], wires=0)
    qml.RX(x[:, 0], wires=0)
    return qml.probs(0)

circuit = qml.QNode(
    _circuit,
    qml.device("default.qubit", shots=1024, wires=1),
)

x = torch.rand(BATCH_SIZE, 1)
y = torch.rand(BATCH_SIZE)
w = torch.rand(2, requires_grad=True)

opt = torch.optim.Adam([w], lr=0.1)

def mse(prediction, target):
    return torch.mean((prediction - target) ** 2)

def cost(params, target, **kwargs):
    prediction = circuit(w=params, **kwargs)
    print(f"Circuit prediction: {prediction}")
    if len(prediction.shape) == 1:
        prediction = prediction[0]
    else:
        prediction = prediction[:,0]
    print(f"Processed circuit prediction: {prediction}")
    return mse(prediction, target)

print(f"Weights before optimising: {w}")
cost_val = cost(w, target=y, x=x)

opt.step()

print(f"Weights after optimising: {w}")
print(f"Cost: {cost_val}")

Output

Weights before optimising: tensor([0.2912, 0.7003], requires_grad=True)
Circuit prediction: tensor([[0.8311, 0.1689],
        [0.9014, 0.0986],
        [0.9570, 0.0430],
        [0.9004, 0.0996],
        [0.8955, 0.1045]], dtype=torch.float64, grad_fn=<ExecuteTapesBackward>)
Processed circuit prediction: tensor([0.8311, 0.9014, 0.9570, 0.9004, 0.8955], dtype=torch.float64,
       grad_fn=<SelectBackward0>)
Weights after optimising: tensor([0.2912, 0.7003], requires_grad=True)
Cost: 0.23868222326400854

Source code

## Non-working example with autograd
import pennylane as qml
import pennylane.numpy as pnp
import numpy as np

BATCH_SIZE = 5

def _circuit(w, x):
    qml.RX(w[0], wires=0)
    qml.RZ(w[1], wires=0)
    qml.RX(x[:, 0], wires=0)
    return qml.probs(0)

circuit = qml.QNode(
    _circuit,
    qml.device("default.qubit", shots=1024, wires=1),
)

x = np.random.rand(BATCH_SIZE, 1)
y = np.random.rand(BATCH_SIZE)
w = pnp.random.rand(2, requires_grad=True)

opt = qml.AdamOptimizer(0.1)

def mse(prediction, target):
    return pnp.mean((prediction - target) ** 2)

def cost(params, target, **kwargs):
    prediction = circuit(w=params, **kwargs)
    print(f"Circuit prediction: {prediction}")
    if len(prediction.shape) == 1:
        prediction = prediction[0]
    else:
        prediction = prediction[:,0]
    print(f"Processed circuit prediction: {prediction}")
    return mse(prediction, target)

print(f"Weights before optimising: {w}")
w, cost_val = opt.step_and_cost(
    cost,
    w,
    target=y,
    x=x,
)

print(f"Weights after optimising: {w}")
print(f"Cost: {cost_val}")

Tracebacks

Weights before optimising: [0.52479668 0.21144672]
Circuit prediction: Autograd ArrayBox with value [[0.78417969 0.21582031]
 [0.61816406 0.38183594]
 [0.74902344 0.25097656]
 [0.79589844 0.20410156]
 [0.6328125  0.3671875 ]]
Processed circuit prediction: Autograd ArrayBox with value [0.78417969 0.61816406 0.74902344 0.79589844 0.6328125 ]
Traceback (most recent call last):
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/gradients/vjp.py", line 156, in compute_vjp_single
    res = jac @ dy_row
          ~~~~^~~~~~~~
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 10 is different from 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<path>/test.py", line 39, in <module>
    w, cost_val = opt.step_and_cost(
                  ^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/optimize/gradient_descent.py", line 64, in step_and_cost
    g, forward = self.compute_grad(objective_fn, args, kwargs, grad_fn=grad_fn)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/optimize/gradient_descent.py", line 122, in compute_grad
    grad = g(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/_grad.py", line 166, in __call__
    grad_value, ans = grad_fn(*args, **kwargs)  # pylint: disable=not-callable
                      ^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/autograd/wrap_util.py", line 20, in nary_f
    return unary_operator(unary_f, x, *nary_op_args, **nary_op_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/_grad.py", line 192, in _grad_with_forward
    grad_value = vjp(vspace(ans).ones())
                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/autograd/core.py", line 14, in vjp
    def vjp(g): return backward_pass(g, end_node)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/autograd/core.py", line 21, in backward_pass
    ingrads = node.vjp(outgrad[0])
              ^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/autograd/core.py", line 67, in <lambda>
    return lambda g: (vjp(g),)
                      ^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/workflow/interfaces/autograd.py", line 199, in grad_fn
    vjps = jpc.compute_vjp(tapes, dy)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/workflow/jacobian_products.py", line 297, in compute_vjp
    return _compute_vjps(jacs, dy, tapes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/workflow/jacobian_products.py", line 46, in _compute_vjps
    vjps.append(f[multi](dy, jac))
                ^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/gradients/vjp.py", line 158, in compute_vjp_single
    res = qml.math.tensordot(jac, dy_row, [[1], [0]])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/math/multi_dispatch.py", line 152, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/pennylane/math/multi_dispatch.py", line 403, in tensordot
    return np.tensordot(tensor1, tensor2, axes=axes, like=like)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/autoray/autoray.py", line 81, in do
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "<path>/.venv/lib/python3.11/site-packages/numpy/core/numeric.py", line 1099, in tensordot
    raise ValueError("shape-mismatch for sum")
ValueError: shape-mismatch for sum

System information

Name: PennyLane
Version: 0.37.0
Summary: PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network
.
Home-page: https://github.com/PennyLaneAI/pennylane
Author: 
Author-email: 
License: Apache License 2.0
Location: <path>/.venv/lib/python3.11/site-packages
Requires: appdirs, autograd, autoray, cachetools, networkx, numpy, packaging, pennylane-lightning, requests, rustworkx, scipy, semantic-version, toml, typing-extensions
Required-by: PennyLane_Lightning

Platform info:           Linux-6.1.90-1-MANJARO-x86_64-with-glibc2.39
Python version:          3.11.5
Numpy version:           1.26.4
Scipy version:           1.14.0
Installed devices:
- default.clifford (PennyLane-0.37.0)
- default.gaussian (PennyLane-0.37.0)
- default.mixed (PennyLane-0.37.0)
- default.qubit (PennyLane-0.37.0)
- default.qubit.autograd (PennyLane-0.37.0)
- default.qubit.jax (PennyLane-0.37.0)
- default.qubit.legacy (PennyLane-0.37.0)
- default.qubit.tf (PennyLane-0.37.0)
- default.qubit.torch (PennyLane-0.37.0)
- default.qutrit (PennyLane-0.37.0)
- default.qutrit.mixed (PennyLane-0.37.0)
- default.tensor (PennyLane-0.37.0)
- null.qubit (PennyLane-0.37.0)
- lightning.qubit (PennyLane_Lightning-0.37.0)

Existing GitHub issues

albi3ro commented 3 months ago

Thanks for opening this issue @majafranz .

Adding some extra details after an initial investigation.

More localized example of the issue:

@qml.qnode(qml.device('default.qubit'), diff_method="parameter-shift")
def circuit(x, data):
    qml.RX(x[0], 0)
    qml.RX(x[1], 0)
    qml.RY(data, 0)
    return qml.probs(wires=0)

x = qml.numpy.array([0.5, 0.8], requires_grad=True)
data = qml.numpy.array([1.2, 2.3, 3.4], requires_grad=False)
circuit(x, data)
qml.jacobian(circuit)(x, data)

Combination of things that give rise to this issue:

1) batching in non-trainable data 2) Measurement with a shape (ie probs) 3) More than one trainable parameter in the circuit

Removing any of the above characteristics allows it to work.