Open jackaraz opened 2 years ago
@jackaraz thanks for your patience (and happy new year!) -- the PennyLane dev team has been on break over the new year.
Regarding your issue here, it seems that parameter-shift + autograph might not be working correctly. I was wondering if you could let me know the output for the following script?
import tensorflow as tf
import pennylane as qml
tf.random.set_seed(137)
weights = tf.Variable(tf.random.uniform((2,), dtype=tf.float64), trainable=True)
inputs = tf.random.uniform((10, 2), dtype=tf.float64)
y_truth = tf.random.stateless_binomial((10, 2), [10, 11], 1, 0.5)
@qml.batch_transform
def batch_input_tf(tape):
parameters = tape.get_parameters(trainable_only=False)
unstacked_inpt = tf.unstack(parameters[0])
output = [[x] + parameters[1:] for x in unstacked_inpt]
# Construct new output tape with unstacked inputs
output_tapes = []
for params in output:
new_tape = tape.copy(copy_operations=True)
new_tape.set_parameters(params, trainable_only=False)
output_tapes.append(new_tape)
return output_tapes, lambda x: qml.math.squeeze(qml.math.stack(x))
dev = qml.device("default.qubit", wires=2, shots=None)
@tf.function
@batch_input_tf
@qml.qnode(dev, interface="tf", diff_method="parameter-shift")
def circuit(inputs, weights):
qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
qml.CNOT(wires=[0, 1])
return qml.probs(op=qml.PauliZ(1))
with tf.GradientTape() as tape:
yhat = circuit(inputs, weights)
loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
print("Parameter-shift gradient (autograph):", tape.gradient(loss, weights))
When running this locally, I get
Parameter-shift gradient (autograph): tf.Tensor([ 0.06276735 -0.05202713], shape=(2,), dtype=float64)
which seems to match with backprop
mode.
Regarding tf.vectorized_map
, this is something we would love to get working with PennyLane. Unfortunately, I can't seem to be able to get it running as per your example above, even in backprop mode:
dev = qml.device("default.qubit.tf", wires=2, shots=None)
@tf.function
@qml.qnode(dev, interface="tf", diff_method="backprop")
def circuit(inputs, weights):
qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
qml.CNOT(wires=[0, 1])
return qml.probs()
contract = lambda ins, ws : tf.vectorized_map(lambda vec: circuit(vec, ws), ins)
with tf.GradientTape() as tape:
tape.watch(weights)
yhat = contract(inputs, weights)
loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
print("Vectorized gradient:", tape.gradient(loss, weights))
gives me
File "/home/josh/xanadu/pennylane/pennylane/_qubit_device.py", line 537, in generate_basis_states *
-1, num_wires
ValueError: cannot reshape array of size 0 into shape (0)
Hi, @josh146 Happy new year!!! No worries at all I figured :)
Response to the first message: I'm getting exactly the same results as you do so with parameter shift I get:
Parameter-shift gradient (autograph): tf.Tensor([ 0.06276735 -0.05202713], shape=(2,), dtype=float64)
and similarly with backdrop I get tf.Tensor([ 0.06276735 -0.05202713], shape=(2,), dtype=float64)
. However, with parameter shift I also get the following warnings;
WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.
WARNING:tensorflow:AutoGraph could not transform <function _gcd_import at 0x103b3f430> and will run it as-is.
Cause: Unable to locate the source code of <function _gcd_import at 0x103b3f430>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
Response to the second message: That is strange using the code below
dev = qml.device("default.qubit.tf", wires=2, shots=None)
@tf.function
@qml.qnode(dev, diff_method="backprop", interface="tf")
def circuit(inputs, weights):
qml.AngleEmbedding(inputs, wires = range(2), rotation="Y")
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
qml.CNOT(wires = [0, 1])
return qml.probs(op=qml.PauliZ(1))
contract = lambda ins, ws : tf.vectorized_map(lambda vec: circuit(vec, ws), ins)
with tf.GradientTape() as tape:
tape.watch(weights)
yhat = contract(inputs, weights)
loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_truth, yhat))
print("Vectorized gradient:", tape.gradient(loss, weights))
I'm getting the following result;
Vectorized gradient: tf.Tensor([ 0.06276735 -0.05202713], shape=(2,), dtype=float64)
along with the following warning:
WARNING:tensorflow:AutoGraph could not transform <function _gcd_import at 0x109eeb430> and will run it as-is.
Cause: Unable to locate the source code of <function _gcd_import at 0x109eeb430>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function _gcd_import at 0x109eeb430> and will run it as-is.
Cause: Unable to locate the source code of <function _gcd_import at 0x109eeb430>. Note that functions defined in certain environments, like the interactive Python shell, do not expose their source code. If that is the case, you should define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.experimental.do_not_convert. Original error: could not get source code
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
But strangely enough, when I copy your code I get the exact same error as you do. Could you try with mine? I do not see any difference between the two tho I have no idea why this is happening
Thanks @jackaraz --- I can now replicate the issue below, with the following minimal example:
dev = qml.device("default.qubit", wires=2, shots=None)
@tf.function
@qml.qnode(dev, diff_method="parameter-shift", interface="tf")
def circuit(inputs, weights):
qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
qml.CNOT(wires=[0, 1])
return qml.expval(qml.PauliZ(0))
cost = lambda inputs, w: tf.vectorized_map(lambda x: circuit(x, w), inputs)
weights = tf.Variable(tf.ones((2,), dtype=tf.float64))
inputs = tf.ones((10, 2), dtype=tf.float64)
with tf.GradientTape() as tape:
loss = tf.reduce_sum(cost(inputs, weights))
print("Vectorized loss:", loss)
print("Vectorized gradient:", tape.gradient(loss, weights))
This gives output
Vectorized loss: tf.Tensor(-4.161468365471425, shape=(), dtype=float64)
Vectorized gradient: tf.Tensor([0. 0.], shape=(2,), dtype=float64)
rather than the expected
Vectorized loss: tf.Tensor(-4.161468365471425, shape=(), dtype=float64)
Vectorized gradient: tf.Tensor([-9.09297427e+00 -8.88178420e-16], shape=(2,), dtype=float64)
Strangely enough, the reason seems to be that custom gradient function
is not being called by TensorFlow in vectorized mode --- indicating that, somewhere, the computational graph 'linking' the QNode output with the grad_fn
defined in that file is being broken 🤔
I'm not yet 100% sure on the reason for why this is not working, but it could be multiple things:
There could be a bug in the tensorflow-autograph.py
interface
There could be a point in the logical flow where non-TensorFlow control flow/functions are being used, which is breaking the computational graph (although this does not seem likely, since @tf.function
by itself is working).
Finally, it could be that tf.py_function
, which we use internally to make the parameter-shift rule and quantum hardware compatible with TensorFlow autograph mode, does not support tf.vectorized_map
.
What if we "switched up the order" of GradientTape
and vectorized_map
? Could it work as a workaround?
I.e.,
Rather than
import pennylane as qml
import tensorflow as tf
dev = qml.device("default.qubit", wires=2, shots=None)
@tf.function
@tf.autograph.experimental.do_not_convert
@qml.qnode(dev, diff_method="parameter-shift", interface="tf")
def circuit(inputs, weights):
qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
qml.CNOT(wires=[0, 1])
return qml.expval(qml.PauliZ(0))
def cost(inputs, w):
with tf.GradientTape() as tape:
loss = circuit(inputs, weights)
return loss, tape.gradient(loss, weights)
weights = tf.Variable(tf.ones((2,), dtype=tf.float64))
inputs = tf.ones((10, 2), dtype=tf.float64)
losses, grads = tf.vectorized_map(lambda x: cost(x, weights), inputs)
loss = tf.reduce_sum(losses)
grad = tf.reduce_sum(grads)
print("Vectorized loss:", loss)
print("Vectorized gradient:", grad)
Vectorized loss: tf.Tensor(-4.161468365471425, shape=(), dtype=float64)
Vectorized gradient: tf.Tensor(-9.092974268256818, shape=(), dtype=float64)
This would be the structure following the second example on the tf.vectorized_map
doc page (# Computing per-example gradients
).
Thanks @antalszava! Playing around with your example, the following also seems to work, and might be more performant?
@qml.qnode(dev, diff_method="parameter-shift", interface="tf")
def circuit(inputs, weights):
qml.AngleEmbedding(inputs, wires=range(2), rotation="Y")
qml.RY(weights[0], wires=0)
qml.RY(weights[1], wires=1)
qml.CNOT(wires=[0, 1])
return qml.expval(qml.PauliZ(0))
@tf.function
def cost(inputs, w):
with tf.GradientTape() as tape:
loss = circuit(inputs, weights)
return loss, tape.gradient(loss, weights)
weights = tf.Variable(tf.ones((2,), dtype=tf.float64))
inputs = tf.ones((10, 2), dtype=tf.float64)
losses, grads = tf.vectorized_map(lambda x: cost(x, weights), inputs)
Playing around with your example, the following also seems to work, and might be more performant?
Would we still require post-processing by using tf.reduce_sum
?
As for the original issue, I'd suspect that the root cause comes down to the capabilities (or rather the lack thereof) of tf.vectorized_map
in TensorFlow:
However this is an experimental feature and currently has a lot of limitations:
from its page.
tf.map_fn
, that is an alternative mentioned for tf.vectorized_map
, the original example posted by @jackaraz works without errors and yields the correct results locally.tf.vectorized_map
, the issue seems to arise with the tf.py_function
call that we have in place here. In specific, it seems as though vjps were returned without completing this call successfully (the _backward
function is not being called).Based on 2., I'd think that tf.vectorized_map
is incompatible with tf.py_function
. If we agree here, I might remove the Bug label on this issue.
Would we still require post-processing by using tf.reduce_sum?
Most likely, I just wanted to simplify the minimal working example to isolate just the QNode + tf.vectorized_map
, since the post-processing could differ depending on the circumstances 🙂
Based on 2., I'd think that tf.vectorized_map is incompatible with tf.py_function. If we agree here, I might remove the Bug label on this issue.
Nice 🕵️ work! Since the answer is not so clear in the TF documentation, perhaps it might be worth opening an issue on the TF GitHub page?
Hi @antalszava & @josh146, thanks a lot for all the answers.
Would we still require post-processing by using
tf.reduce_sum
?
Yes definitely, vectorized_map
is just a mapping of the first axis of the sample, let's say the input shape is (Nt, nqubit)
, so the shape of the output will be (Nt, outdim)
where Nt
is number of examples that you provide. Note that with the gradient included in the vectorized_map
it might be essential to declare the axis to apply to reduce sum.
When changing to
tf.map_fn
, that is an alternative mentioned fortf.vectorized_map
, the original example posted by @jackaraz works without errors and yields the correct results locally.
I can confirm that map_fn
works nicely across all penny lane platforms that I tried so far in a much more complex setting but it is not as efficient as vectorized_map
. I believe this is because vectorize_map
executes everything in the eager mode, so tensors are just memory maps, i.e. you can not access the value of the tensor during execution. But map_fn
does not do the same. I observed order of magnitude difference in speed both on CPU and GPU between map_fn
and vectorized_map
. So I guess there is not an easy solution to use vectorized_map
given the current status of the TensorFlow and penny lane, but yes map_fn
is definitely a good alternative. However, I wouldn't use it to execute with the ibmq
backend since it will submit jobs one by one to the quantum computer.
Hi @jackaraz, thanks :slightly_smiling_face:
Josh managed to recreate our use case fully in TensorFlow and we've opened an issue: https://github.com/tensorflow/tensorflow/issues/53726
As this is not an issue directly related to PennyLane and there are workarounds to this, I'll lift the bug label from here.
Hi, I'm trying to parallelize my quantum circuit execution on GPU using
tf.vectorized_map
following the thread on this link. This function allows the execution of each input to be parallelized on each GPU (or CPU) core and its seems to be working as expected if I just calculate the result of the circuit. But I realized that taking the gradient of the circuit causes some issues. In the following I prepared some sample code.Above I prepared two simple circuit one using purely TensorFlow and other is using Quasm simulator. Using the batched execution proposed in this link I can produce expected results for both circuit;
I tested this function in a more realistic example and it works perfectly the problem is its not parallelized hence the execution is extremely slow which just increases with large number of shots as expected. Hence I wanted to parallelize the execution of the circuit using
tf.vectorized_map
;This function executes each input on different CPU/GPU hence much much more faster than the execution above. However I realised that my gradients are always zero for
parameter-shift
and I'm getting the following warning;and if I instead use
backprop
fordev2
it seems to work so I believe the problem is withparameter-shift
. Hence I was wondering if there is a better way to parallelize the circuit execution or am I doing a mistake in my workflow. Any suggestion highly appreciated.System Settings:
Thanks Jack