PennyLaneAI / pennylane

PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.
https://pennylane.ai
Apache License 2.0
2.35k stars 604 forks source link

`tf.vectorized_map` doesn't work with the parameter-shift rule #2068

Open jackaraz opened 2 years ago

jackaraz commented 2 years ago

Hi, I'm trying to parallelize my quantum circuit execution on GPU using tf.vectorized_map following the thread on this link. This function allows the execution of each input to be parallelized on each GPU (or CPU) core and its seems to be working as expected if I just calculate the result of the circuit. But I realized that taking the gradient of the circuit causes some issues. In the following I prepared some sample code.

import tensorflow as tf
import pennylane as qml

dev1 = qml.device("qiskit.aer", wires = 2, shots=10, backend='qasm_simulator')
dev2 = qml.device("default.qubit.tf", wires = 2, shots=None)

@qml.qnode(dev2, diff_method="parameter-shift", interface="tf")
def circuit2(inputs, weights):
    qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")

    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires = [0, 1])

    return qml.probs(op=qml.PauliZ(1))

@qml.qnode(dev1, diff_method="parameter-shift", interface="tf")
def circuit1(inputs, weights):
    qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")

    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires = [0, 1])

    return qml.probs(op=qml.PauliZ(1))

weights = tf.Variable(tf.random.uniform((2,), dtype=tf.float64), trainable=True)
inputs = tf.random.uniform((10,2), dtype=tf.float64)
y_truth = tf.random.stateless_binomial((10,2), [10,11], 1, 0.5)

Above I prepared two simple circuit one using purely TensorFlow and other is using Quasm simulator. Using the batched execution proposed in this link I can produce expected results for both circuit;

batched_circuit1 = batch_input_tf(circuit1)
batched_circuit2 = batch_input_tf(circuit2)

with tf.GradientTape() as tape:
    yhat = batched_circuit1(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([-0.15988095,  0.15464286])>

with tf.GradientTape() as tape:
    yhat = batched_circuit2(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([-0.07629922,  0.01617766])>

I tested this function in a more realistic example and it works perfectly the problem is its not parallelized hence the execution is extremely slow which just increases with large number of shots as expected. Hence I wanted to parallelize the execution of the circuit using tf.vectorized_map;

circ = tf.function(circuit2) # this can be one or two both gives the same result for parameter-shift
contract = lambda ins, ws : tf.vectorized_map(lambda vec: circ(vec, ws), ins)
with tf.GradientTape() as tape:
    tape.watch(weights)
    yhat = contract(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
tape.gradient(loss, weights)
# Output: <tf.Tensor: shape=(2,), dtype=float64, numpy=array([0., 0.])>

This function executes each input on different CPU/GPU hence much much more faster than the execution above. However I realised that my gradients are always zero for parameter-shift and I'm getting the following warning;

WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.

and if I instead use backprop for dev2 it seems to work so I believe the problem is with parameter-shift. Hence I was wondering if there is a better way to parallelize the circuit execution or am I doing a mistake in my workflow. Any suggestion highly appreciated.

System Settings:

Thanks Jack

josh146 commented 2 years ago

@jackaraz thanks for your patience (and happy new year!) -- the PennyLane dev team has been on break over the new year.

Regarding your issue here, it seems that parameter-shift + autograph might not be working correctly. I was wondering if you could let me know the output for the following script?

import tensorflow as tf
import pennylane as qml

tf.random.set_seed(137)
weights = tf.Variable(tf.random.uniform((2,), dtype=tf.float64), trainable=True)
inputs = tf.random.uniform((10, 2), dtype=tf.float64)
y_truth = tf.random.stateless_binomial((10, 2), [10, 11], 1, 0.5)

@qml.batch_transform
def batch_input_tf(tape):
    parameters = tape.get_parameters(trainable_only=False)

    unstacked_inpt = tf.unstack(parameters[0])
    output = [[x] + parameters[1:] for x in unstacked_inpt]

    # Construct new output tape with unstacked inputs
    output_tapes = []
    for params in output:
        new_tape = tape.copy(copy_operations=True)
        new_tape.set_parameters(params, trainable_only=False)
        output_tapes.append(new_tape)

    return output_tapes, lambda x: qml.math.squeeze(qml.math.stack(x))

dev = qml.device("default.qubit", wires=2, shots=None)

@tf.function
@batch_input_tf
@qml.qnode(dev, interface="tf", diff_method="parameter-shift")
def circuit(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.probs(op=qml.PauliZ(1))

with tf.GradientTape() as tape:
    yhat = circuit(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))

print("Parameter-shift gradient (autograph):", tape.gradient(loss, weights))

When running this locally, I get

Parameter-shift gradient (autograph): tf.Tensor([ 0.06276735 -0.05202713], shape=(2,), dtype=float64)

which seems to match with backprop mode.

josh146 commented 2 years ago

Regarding tf.vectorized_map, this is something we would love to get working with PennyLane. Unfortunately, I can't seem to be able to get it running as per your example above, even in backprop mode:

dev = qml.device("default.qubit.tf", wires=2, shots=None)

@tf.function
@qml.qnode(dev, interface="tf", diff_method="backprop")
def circuit(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.probs()

contract = lambda ins, ws : tf.vectorized_map(lambda vec: circuit(vec, ws), ins)

with tf.GradientTape() as tape:
    tape.watch(weights)
    yhat = contract(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))

print("Vectorized gradient:", tape.gradient(loss, weights))

gives me

    File "/home/josh/xanadu/pennylane/pennylane/_qubit_device.py", line 537, in generate_basis_states  *
        -1, num_wires

    ValueError: cannot reshape array of size 0 into shape (0)
jackaraz commented 2 years ago

Hi, @josh146 Happy new year!!! No worries at all I figured :)

Response to the first message: I'm getting exactly the same results as you do so with parameter shift I get:

Parameter-shift gradient (autograph): tf.Tensor([ 0.06276735 -0.05202713], shape=(2,), dtype=float64)

and similarly with backdrop I get tf.Tensor([ 0.06276735 -0.05202713], shape=(2,), dtype=float64). However, with parameter shift I also get the following warnings;

WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.
WARNING:tensorflow:AutoGraph could not transform <function _gcd_import at 0x103b3f430> and will run it as-is.
Cause: Unable to locate the source code of <function _gcd_import at 0x103b3f430>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

Response to the second message: That is strange using the code below

dev = qml.device("default.qubit.tf", wires=2, shots=None)

@tf.function
@qml.qnode(dev, diff_method="backprop", interface="tf")
def circuit(inputs, weights):
    qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")

    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires = [0, 1])

    return qml.probs(op=qml.PauliZ(1))

contract = lambda ins, ws : tf.vectorized_map(lambda vec: circuit(vec, ws), ins)
with tf.GradientTape() as tape:
    tape.watch(weights)
    yhat = contract(inputs, weights)
    loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))

print("Vectorized gradient:", tape.gradient(loss, weights))

I'm getting the following result; Vectorized gradient: tf.Tensor([ 0.06276735 -0.05202713], shape=(2,), dtype=float64) along with the following warning:

WARNING:tensorflow:AutoGraph could not transform <function _gcd_import at 0x109eeb430> and will run it as-is.
Cause: Unable to locate the source code of <function _gcd_import at 0x109eeb430>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function _gcd_import at 0x109eeb430> and will run it as-is.
Cause: Unable to locate the source code of <function _gcd_import at 0x109eeb430>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

But strangely enough, when I copy your code I get the exact same error as you do. Could you try with mine? I do not see any difference between the two tho I have no idea why this is happening

josh146 commented 2 years ago

Thanks @jackaraz --- I can now replicate the issue below, with the following minimal example:

dev = qml.device("default.qubit", wires=2, shots=None)

@tf.function
@qml.qnode(dev, diff_method="parameter-shift", interface="tf")
def circuit(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(0))

cost = lambda inputs, w: tf.vectorized_map(lambda x: circuit(x, w), inputs)

weights = tf.Variable(tf.ones((2,), dtype=tf.float64))
inputs = tf.ones((10, 2), dtype=tf.float64)

with tf.GradientTape() as tape:
    loss = tf.reduce_sum(cost(inputs, weights))

print("Vectorized loss:", loss)
print("Vectorized gradient:", tape.gradient(loss, weights))

This gives output

Vectorized loss: tf.Tensor(-4.161468365471425, shape=(), dtype=float64)
Vectorized gradient: tf.Tensor([0. 0.], shape=(2,), dtype=float64)

rather than the expected

Vectorized loss: tf.Tensor(-4.161468365471425, shape=(), dtype=float64)
Vectorized gradient: tf.Tensor([-9.09297427e+00 -8.88178420e-16], shape=(2,), dtype=float64)

Strangely enough, the reason seems to be that custom gradient function

https://github.com/PennyLaneAI/pennylane/blob/master/pennylane/interfaces/batch/tensorflow_autograph.py#L126-L128

is not being called by TensorFlow in vectorized mode --- indicating that, somewhere, the computational graph 'linking' the QNode output with the grad_fn defined in that file is being broken 🤔

josh146 commented 2 years ago

I'm not yet 100% sure on the reason for why this is not working, but it could be multiple things:

antalszava commented 2 years ago

What if we "switched up the order" of GradientTape and vectorized_map? Could it work as a workaround?

I.e.,

  1. We create a cost function that returns the loss and the gradient of a circuit;
  2. We vectorize the cost function and provide a batch of inputs.

Rather than

  1. We vectorize the circuit;
  2. We get the gradient and the loss of the vectorized circuit by providing a batch of inputs.
import pennylane as qml
import tensorflow as tf

dev = qml.device("default.qubit", wires=2, shots=None)

@tf.function
@tf.autograph.experimental.do_not_convert
@qml.qnode(dev, diff_method="parameter-shift", interface="tf")
def circuit(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(0))

def cost(inputs, w):
    with tf.GradientTape() as tape:
        loss = circuit(inputs, weights)
    return loss, tape.gradient(loss, weights)

weights = tf.Variable(tf.ones((2,), dtype=tf.float64))
inputs = tf.ones((10, 2), dtype=tf.float64)

losses, grads = tf.vectorized_map(lambda x: cost(x, weights), inputs)
loss = tf.reduce_sum(losses)
grad = tf.reduce_sum(grads)

print("Vectorized loss:", loss)
print("Vectorized gradient:", grad)
Vectorized loss: tf.Tensor(-4.161468365471425, shape=(), dtype=float64)
Vectorized gradient: tf.Tensor(-9.092974268256818, shape=(), dtype=float64)

This would be the structure following the second example on the tf.vectorized_map doc page (# Computing per-example gradients).

josh146 commented 2 years ago

Thanks @antalszava! Playing around with your example, the following also seems to work, and might be more performant?

@qml.qnode(dev, diff_method="parameter-shift", interface="tf")
def circuit(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
    qml.RY(weights[0], wires=0)
    qml.RY(weights[1], wires=1)
    qml.CNOT(wires=[0, 1])
    return qml.expval(qml.PauliZ(0))

@tf.function
def cost(inputs, w):
    with tf.GradientTape() as tape:
        loss = circuit(inputs, weights)
    return loss, tape.gradient(loss, weights)

weights = tf.Variable(tf.ones((2,), dtype=tf.float64))
inputs = tf.ones((10, 2), dtype=tf.float64)

losses, grads = tf.vectorized_map(lambda x: cost(x, weights), inputs)
antalszava commented 2 years ago

Playing around with your example, the following also seems to work, and might be more performant?

Would we still require post-processing by using tf.reduce_sum?


As for the original issue, I'd suspect that the root cause comes down to the capabilities (or rather the lack thereof) of tf.vectorized_map in TensorFlow:

However this is an experimental feature and currently has a lot of limitations:

from its page.

  1. When changing to tf.map_fn, that is an alternative mentioned for tf.vectorized_map, the original example posted by @jackaraz works without errors and yields the correct results locally.
  2. When digging a bit deeper into the example that uses tf.vectorized_map, the issue seems to arise with the tf.py_function call that we have in place here. In specific, it seems as though vjps were returned without completing this call successfully (the _backward function is not being called).

Based on 2., I'd think that tf.vectorized_map is incompatible with tf.py_function. If we agree here, I might remove the Bug label on this issue.

josh146 commented 2 years ago

Would we still require post-processing by using tf.reduce_sum?

Most likely, I just wanted to simplify the minimal working example to isolate just the QNode + tf.vectorized_map, since the post-processing could differ depending on the circumstances 🙂

Based on 2., I'd think that tf.vectorized_map is incompatible with tf.py_function. If we agree here, I might remove the Bug label on this issue.

Nice 🕵️ work! Since the answer is not so clear in the TF documentation, perhaps it might be worth opening an issue on the TF GitHub page?

jackaraz commented 2 years ago

Hi @antalszava & @josh146, thanks a lot for all the answers.

Would we still require post-processing by using tf.reduce_sum?

Yes definitely, vectorized_map is just a mapping of the first axis of the sample, let's say the input shape is (Nt, nqubit), so the shape of the output will be (Nt, outdim) where Nt is number of examples that you provide. Note that with the gradient included in the vectorized_map it might be essential to declare the axis to apply to reduce sum.

When changing to tf.map_fn, that is an alternative mentioned for tf.vectorized_map, the original example posted by @jackaraz works without errors and yields the correct results locally.

I can confirm that map_fn works nicely across all penny lane platforms that I tried so far in a much more complex setting but it is not as efficient as vectorized_map. I believe this is because vectorize_map executes everything in the eager mode, so tensors are just memory maps, i.e. you can not access the value of the tensor during execution. But map_fn does not do the same. I observed order of magnitude difference in speed both on CPU and GPU between map_fn and vectorized_map. So I guess there is not an easy solution to use vectorized_map given the current status of the TensorFlow and penny lane, but yes map_fn is definitely a good alternative. However, I wouldn't use it to execute with the ibmq backend since it will submit jobs one by one to the quantum computer.

antalszava commented 2 years ago

Hi @jackaraz, thanks :slightly_smiling_face:

Josh managed to recreate our use case fully in TensorFlow and we've opened an issue: https://github.com/tensorflow/tensorflow/issues/53726

As this is not an issue directly related to PennyLane and there are workarounds to this, I'll lift the bug label from here.