Broadcast_to error in tensorflow backprop

bkiani commented 3 years ago

Issue description

There seems to be a weird issue with broadcasting tensors in tensorflow. Apologies if I am making a simple mistake, but please see below for the brief description.

In my code, I have a simple cost function which is a weighted sum of Pauli operators evaluated using a qnode collection (see below for function).

    def measure_hamiltonian(self, ops, coeffs, is_loss = True):

        output = 0

        for i, circuit in enumerate(self.circuits):
            qnodes = qml.map(circuit, ops, self.dev_gen[i], 
                            measure="expval", interface = config.interface, diff_method=config.diff_method)
            measurements = qnodes( self.params[i] )
            measurements = tf.reshape(measurements, [-1])
            output += self.probs_norm[i]*tf.reduce_sum(measurements*coeffs)

        if is_loss:
            self.loss = output

        return output

The function evaluates fine (no errors), but on evaluating gradients in a gradient tape, I get the following error snippet:

Traceback (most recent call last):
  File "/home/bobak/pennylane/run_butterfly_random.py", line 68, in <module>
    added_fields = added_field_dict, save_every = 10, verbose = True)
  File "/home/bobak/pennylane/QGAN.py", line 344, in optimize
    gradients = tape.gradient(ham_loss, self.gen.params+[self.gen.probs])
  File "/home/bobak/.conda/envs/tf_gpu3/lib/python3.7/site-packages/tensorflow/python/eager/backprop.py", line 1073, in gradient
    unconnected_gradients=unconnected_gradients)
  File "/home/bobak/.conda/envs/tf_gpu3/lib/python3.7/site-packages/tensorflow/python/eager/imperative_grad.py", line 77, in imperative_grad
    compat.as_str(unconnected_gradients.value))
  File "/home/bobak/.conda/envs/tf_gpu3/lib/python3.7/site-packages/tensorflow/python/eager/backprop.py", line 162, in _gradient_function
    return grad_fn(mock_op, *out_grads)
  File "/home/bobak/.conda/envs/tf_gpu3/lib/python3.7/site-packages/tensorflow/python/ops/math_grad.py", line 214, in _SumGrad
    return [array_ops.broadcast_to(grad, input_shape), None]
  File "/home/bobak/.conda/envs/tf_gpu3/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 836, in broadcast_to
    _ops.raise_from_not_ok_status(e, name)
  File "/home/bobak/.conda/envs/tf_gpu3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnimplementedError: Broadcast between [1,2,1,1,1,2,1,2] and [2,2,2,2,2,2,2,2] is not supported yet. [Op:BroadcastTo]

Reproduces how often: This only happens when the operator I use is a Tensor() of 3 or more Pauli operators. I believe this is related to the issue here: https://github.com/tensorflow/tensorflow/issues/17762
System information: (post the output of import pennylane as qml; qml.about()) Name: PennyLane Version: 0.12.0 Summary: PennyLane is a Python quantum machine learning library by Xanadu Inc. Home-page: https://github.com/XanaduAI/pennylane Author: None Author-email: None License: Apache License 2.0 Location: /home/bobak/.conda/envs/tf_gpu3/lib/python3.7/site-packages Requires: scipy, toml, numpy, semantic-version, networkx, appdirs, autograd Required-by: Platform info: Linux-4.16.0-041600-generic-x86_64-with-debian-buster-sid Python version: 3.7.3 Numpy version: 1.18.5 Scipy version: 1.4.1 Installed devices:
default.gaussian (PennyLane-0.12.0)
default.mixed (PennyLane-0.12.0)
default.qubit (PennyLane-0.12.0)
default.qubit.autograd (PennyLane-0.12.0)
default.qubit.tf (PennyLane-0.12.0)
default.tensor (PennyLane-0.12.0)
default.tensor.tf (PennyLane-0.12.0)

co9olguy commented 3 years ago

Thanks for reporting this @bkiani!

To help us track it down, it seems the error is surfaced by your file QGAN.py, line 344, inoptimize`. Are you able to share with us the code containing the erroring line as well, so we can attempt to reproduce?

bkiani commented 3 years ago

Copied the code snippet below. Line 344 is simply the line ham_loss =... which references measure_hamiltonian function written earlier.

with tf.GradientTape() as tape:
    ham_loss = self.gen.measure_hamiltonian(self.dis.active_operators, self.dis.active_coeff)

    gradients = tape.gradient(ham_loss, self.gen.params+[self.gen.probs])
    self.optimizer.apply_gradients(zip(gradients, self.gen.params+[self.gen.probs]))

co9olguy commented 3 years ago

Thanks @bkiani, this is helpful information. We'll take a look and see if we can track it down

bkiani commented 3 years ago

@co9olguy thank you. And for the record, when I use the Pytorch interface with default.qubit, this error does not occur. It seems to be only with tensorflow.

josh146 commented 3 years ago

Hi @bkiani! I'm just attempting to recreate your error now. It seems like there is still some details missing. Namely:

What are your circuit ansatz (in self.circuits)?
What devices are you using, and what differentiation method?
What are your QNode parameters (self.params)?
And finally, what observables and coefficients are you measuring (self.dis.active_operators and self.dis.active_coeff).

If you could have a go reducing down your code to a very minimal non-working example, that would be very appreciated 🙂

bkiani commented 3 years ago

Here you go. This gives me the error I described

import pennylane as qml
import tensorflow as tf
import numpy as np

def circuit(params, **kwargs):
    qml.RX(params[0], wires = 0)
    qml.RY(params[1], wires = 0)
    qml.RZ(params[2], wires = 0)
    for i in range(n-1):
        qml.CRX(params[3+i], wires = [i,i+1])

def measure_hamiltonian(params, ops, coeffs):

    output = 0

    qnodes = qml.map(circuit, ops, dev, 
                    measure="expval", interface = 'tf', diff_method='backprop')
    measurements = qnodes( params )
    measurements = tf.reshape(measurements, [-1])
    output += tf.reduce_sum(measurements*coeffs)

    return output

n = 8
params = tf.Variable(tf.convert_to_tensor(np.random.normal(size = n+2)))

dev = qml.device(   'default.qubit.tf', 
                    wires=n, 
                    analytic = True)

coeffs = tf.Variable([1.,0.5], dtype = tf.double)
ops = [ qml.operation.Tensor(qml.PauliX(0), qml.PauliY(2), qml.PauliZ(3), qml.PauliZ(6)),
        qml.operation.Tensor(qml.PauliY(3), qml.PauliZ(1), qml.PauliX(5), qml.PauliX(4))]

with tf.GradientTape() as tape:
    ham_loss = measure_hamiltonian(params, ops, coeffs)

    gradients = tape.gradient(ham_loss, params)

josh146 commented 3 years ago

Thanks @bkiani, that's perfect! I'll have a look now and get back to you

josh146 commented 3 years ago

Hi @bkiani, I've reduced the minimal code example even further:

import pennylane as qml
import tensorflow as tf
import numpy as np

n = 8

dev = qml.device("default.qubit.tf", wires=n, analytic=True)

@qml.qnode(dev, interface="tf")
def circuit(params, **kwargs):
    qml.RX(params[0], wires=0)
    return qml.expval(qml.PauliX(0) @ qml.PauliY(2) @ qml.PauliZ(6))

params = tf.Variable(tf.convert_to_tensor(np.random.normal(size=n + 2)))

with tf.GradientTape() as tape:
    loss = circuit(params)

gradients = tape.gradient(loss, params)

Which gives the error

tensorflow.python.framework.errors_impl.UnimplementedError: Broadcast between [2,1,2,1,1,1,2,1] and [2,2,2,2,2,2,2,2] is not supported yet. [Op:BroadcastTo]

I'm still not sure exactly what the cause is, but I noticed that either of:

reducing the number of qubits in the circuit to < 8
removing the qml.PauliZ(6) term from the tensor product

eliminates the error.

I believe that, rather than a bug that can be patched in default.qubit.tf, this is instead a result of TF broadcasting rules deviating from NumPy/not being implemented for larger number of qubits. See https://github.com/tensorflow/tensorflow/issues/1519.

More work will be required to work out exactly which operation in default.qubit or qubit.device is not broadcasting properly for a large number of dimensions in TensorFlow, and having an alternate implementation in default.qubit.tf.

bkiani commented 3 years ago

@josh146 Thank you for taking a look. I appreciate the help here.

nguyenquantum commented 3 years ago

Hello @josh146 . I believe I'm getting this same error, where my loss function is a sum of Pauli observables using qnode and tf backprop. My code worked for <7 qubits, but I got the error for >=7 qubits. Have you been able to fix it?

josh146 commented 3 years ago

Hi @nguyentq7! As far as I can tell, a PR was made to TensorFlow to enable broadcasting for more than 6 dimensions (see https://github.com/tensorflow/tensorflow/pull/14997), however it was closed without being merged. Let me explore this further, perhaps there is an approach we can take that avoids broadcasting for a large number of qubits.

dime10 commented 2 years ago

I've had a look at this issue and it seems TensorFlow still does not support broadcasting more than 5 dimensions, and I'm not sure they have plans to change this.

Interestingly, using tape.jacobian instead of tape.gradient seems to circumvent this problem (apparently due to the vectorization in the jacobian function, enabled by default via experimental_use_pfor=True). So the Jacobian could be used as workaround when encoutering this error.

Having said that, I did stumble upon a bug when computing the Jacobian with TensorFlow in the minimal working example given here. For n=8, the results returned by TF are different than those returned by other frameworks, and moreover are different for each execution:

import pennylane as qml
import tensorflow as tf
import numpy as np

n = 8

dev = qml.device("default.qubit", wires=n, shots=None)

def circuit(params, **kwargs):
    qml.RX(params[0], wires = 0)
    qml.RY(params[1], wires = 0)
    qml.RZ(params[2], wires = 0)
    for i in range(n-1):
        qml.CRX(params[3+i], wires = [i,i+1])

obs = [qml.operation.Tensor(qml.PauliX(0), qml.PauliY(2), qml.PauliZ(3), qml.PauliZ(6)),
       qml.operation.Tensor(qml.PauliY(3), qml.PauliZ(1), qml.PauliX(5), qml.PauliX(4))]

def costfn(params, coeffs, interface='autograd'):
    qnodes = qml.map(circuit, obs, dev, interface=interface)
    measurements = qnodes(params)
    return measurements[0]*coeffs[0]+measurements[1]*coeffs[1]

np.random.seed(521)
params = np.array(np.random.normal(size=n + 2))
coeffs = np.array([1., 0.5])
params_tf = tf.Variable(tf.convert_to_tensor(params))
coeffs_tf = tf.Variable(tf.convert_to_tensor(coeffs))

print(qml.jacobian(costfn)(params, coeffs))

with tf.GradientTape() as tape:
    loss = costfn(params_tf, coeffs_tf, interface='tf')
print(tape.jacobian(loss, [params_tf, coeffs_tf]))

n=8

(array([-4.01988240e-05,  3.85935425e-04, -2.71050543e-20, -7.65050131e-04,
       -1.89636182e-03,  1.53095624e-03,  2.29288505e-04, -7.60400460e-04,
        4.58926570e-05,  0.00000000e+00]), array([-1.38777878e-17,  1.05913901e-03]))
[<tf.Tensor: shape=(10,), dtype=float64, numpy=
array([-3.35266778e-02, -1.89079810e-01,  1.65884715e-02, -2.22927873e-02,
        2.60230502e-02,  8.50662385e-04,  2.72236379e-03,  4.38077653e-04,
        3.28701145e-07,  5.94101000e-09])>, <tf.Tensor: shape=(2,), dtype=float64, numpy=array([4.89105941e-18, 1.05913901e-03])>]

n=9

(array([-4.01988240e-05,  3.85935425e-04, -2.71050543e-20, -7.65050131e-04,
       -1.89636182e-03,  1.53095624e-03,  2.29288505e-04, -7.60400460e-04,
        4.58926570e-05,  0.00000000e+00,  0.00000000e+00]), array([-1.38777878e-17,  1.05913901e-03]))
[<tf.Tensor: shape=(11,), dtype=float64, numpy=
array([-4.01988240e-05,  3.85935425e-04, -2.71050543e-20, -7.65050131e-04,
       -1.89636182e-03,  1.53095624e-03,  2.29288505e-04, -7.60400460e-04,
        4.58926570e-05,  0.00000000e+00,  0.00000000e+00])>, <tf.Tensor: shape=(2,), dtype=float64, numpy=array([4.89105941e-18, 1.05913901e-03])>]

PennyLaneAI / pennylane

Broadcast_to error in tensorflow backprop #937

Issue description