Closed ycchen1989 closed 4 years ago
Thanks for reporting @ycchen1989 .
Would you mind also posting how you "run" VQE and "ClassicalScaling" exactly, i.e. in what way you invoke the loss.backward() statement?
At the moment your code only shows the two classes and the initial parameters. It would be really helpful to have a minimum code example that reproduces the error when I run it.
Thanks!
minimum code example:
import os
import argparse
import pennylane as qml
from pennylane import numpy as np
# import numpy as np
# from pennylane.optimize import NesterovMomentumOptimizer
import matplotlib.pyplot as plt
from datetime import datetime
import torch
import torch.nn as nn
from torch.autograd import Variable
import pickle
dtype = torch.cuda.DoubleTensor if torch.cuda.is_available() else torch.DoubleTensor
device = 'cuda' if torch.cuda.is_available() else 'cpu'
###
class ClassicalScaling:
def __init__(self, var_Q_circuit):
self.var_Q_circuit = var_Q_circuit
def forward(self, angles):
return self.var_Q_circuit * angles
class VQC:
def __init__(
self,
num_of_input= 10,
num_of_output= 2,
num_of_wires = 10,
var_Q_circuit = None,
var_Q_bias = None):
self.var_Q_circuit = var_Q_circuit
self.var_Q_bias = var_Q_bias
self.num_of_input = num_of_input
self.num_of_output = num_of_output
self.num_of_wires = num_of_wires
self.dev = qml.device('default.qubit', wires = num_of_wires)
def _layer(self, W):
""" Single layer of the variational classifier.
Args:
W (array[float]): 2-d array of variables for one layer
"""
# W = W.numpy()
qml.CNOT(wires=[0, 1])
qml.CNOT(wires=[1, 2])
qml.CNOT(wires=[2, 3])
qml.CNOT(wires=[3, 4])
qml.CNOT(wires=[4, 5])
qml.CNOT(wires=[5, 6])
qml.CNOT(wires=[6, 7])
qml.CNOT(wires=[7, 8])
qml.CNOT(wires=[8, 9])
# Another CNOT which is not in MNIST experiments!
qml.CNOT(wires=[9, 0])
qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)
qml.Rot(W[4, 0], W[4, 1], W[4, 2], wires=4)
qml.Rot(W[5, 0], W[5, 1], W[5, 2], wires=5)
qml.Rot(W[6, 0], W[6, 1], W[6, 2], wires=6)
qml.Rot(W[7, 0], W[7, 1], W[7, 2], wires=7)
qml.Rot(W[8, 0], W[8, 1], W[8, 2], wires=8)
qml.Rot(W[9, 0], W[9, 1], W[9, 2], wires=9)
def circuit(self, angles):
@qml.qnode(self.dev, interface='torch')
def _circuit(var_Q_circuit, angles):
qml.QubitStateVector(angles, wires=list(range(self.num_of_wires)))
weights = var_Q_circuit
for W in weights:
self._layer(W)
return [qml.expval.PauliZ(k) for k in range(self.num_of_output)]
return _circuit(self.var_Q_circuit, angles)
def _forward(self, angles):
angles = angles / torch.clamp(torch.sqrt(torch.sum(angles ** 2)), min = 1e-9)
raw_output = self.circuit(angles)
m = nn.Softmax(dim=0)
clamp = 1e-9 * torch.ones(self.num_of_output).type(dtype).to(device)
normalized_output = torch.max(raw_output, clamp)
output = m(normalized_output)
return output
def forward(self, angles):
fw = self._forward(angles)
return fw
###
def lost_function_cross_entropy(labels, predictions):
## numpy array
loss = nn.CrossEntropyLoss()
output = loss(predictions, labels)
print("LOSS AVG: ",output)
return output
def cost(VQC, X, Y):
"""Cost (error) function to be minimized."""
# predictions = torch.stack([variational_classifier(var_Q_circuit = var_Q_circuit, var_Q_bias = var_Q_bias, angles=item) for item in X])
## This method still not fully use the CPU resource...
loss = nn.CrossEntropyLoss()
output = loss(torch.stack([VQC.forward(item) for item in X]), Y)
print("LOSS AVG: ",output)
return output
def train_epoch(opt, VQC, X, Y, batch_size):
losses = []
for i in range(5):
batch_index = np.random.randint(0, len(X), (batch_size, ))
X_train_batch = X[batch_index]
Y_train_batch = Y[batch_index]
# opt.step(closure)
opt.zero_grad()
print("CALCULATING LOSS...")
loss = cost(VQC = VQC, X = X_train_batch, Y = Y_train_batch)
print("BACKWARD..")
loss.backward()
losses.append(loss.data.cpu().numpy())
opt.step()
# print("LOSS IN CLOSURE: ", loss)
print("FINISHED OPT.")
# print("CALCULATING PREDICTION.")
losses = np.array(losses)
return losses.mean()
class stackedCircuit:
def __init__(self, scaling_part, vqc):
self.scaling_part = scaling_part
self.vqc = vqc
def forward(self, single_item):
res_temp = self.scaling_part.forward(single_item)
res_temp = self.vqc.forward(res_temp)
return res_temp
def main(batch_size, epoch_num):
var_C_scaling = Variable(torch.ones(1024, dtype = torch.double), requires_grad=True)
scaling_part = ClassicalScaling(var_Q_circuit = var_C_scaling)
num_of_input = 10
num_of_output = 10
num_layers = 2
num_qubits = 10
var_Q_circuit = Variable(torch.tensor(0.01 * np.random.randn(num_layers, num_qubits, 3), device=device).type(dtype), requires_grad=True)
vqc = VQC(num_of_input = num_of_input, num_of_output = num_of_output, num_of_wires = num_qubits, var_Q_circuit = var_Q_circuit, var_Q_bias = None)
stacked = stackedCircuit(scaling_part, vqc)
params = [var_C_scaling, var_Q_circuit]
opt = torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)
x_for_train = torch.tensor(np.random.randn(10000, 1024), device=device).type(dtype)
y_for_train = torch.tensor(np.zeros(10000), device=device).type(torch.LongTensor)
x_for_train = x_for_train.type(dtype)
for it in range(epoch_num):
# Need to save data
avg_loss_in_epoch = train_epoch(opt, stacked, x_for_train, y_for_train, batch_size)
if __name__ == '__main__':
main(batch_size = 10, epoch_num = 100)
Thanks @ycchen1989 !
I see that in your code you using a rather old version of PennyLane. Would you mind installing the latest one first and see if it resolves the issue? That might save me the time it takes to turn your code into a minimum example and start debugging.
Also check the latest templates functions in the API, which explains what parts are differentiable and which ones are not.
PS: For future reference, a minimum working example usually consists of only a few lines of code needed to reproduce the issue. That allows us to respond a lot quicker and better!
Note, tf you run your code with the latest PennyLane version, you will have to change qml.expval.PauliZ(k)
to qml.expval(qml.PauliZ(k))
, which is the new signature
Thanks for replying. I just install the latest version of pennylane(v0.7) and modify the expectation value to
[qml.expval(qml.PauliZ(k)) for k in range(self.num_of_output)]
and also change the
qml.QubitStateVector(angles, wires=list(range(self.num_of_wires)))
to
qml.templates.embeddings.AmplitudeEmbedding(angles, wires=list(range(self.num_of_wires)))
but the error messages persist.
Thanks for trying, I will have a proper look asap then.
If you can shorten the example to a few lines I will be much faster in helping you, and I would be extremely grateful. It is difficult to dig through 200 lines of code that I have not written...
Thanks for replying. I tried to shorten the code to about 100 lines. Hope it helps.
import pennylane as qml
from pennylane import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import torch
import torch.nn as nn
from torch.autograd import Variable
dtype = torch.cuda.DoubleTensor if torch.cuda.is_available() else torch.DoubleTensor
device = 'cuda' if torch.cuda.is_available() else 'cpu'
###
class VQC:
def __init__(
self,
num_of_input= 10,
num_of_output= 2,
num_of_wires = 10,
var_Q_circuit = None,
var_C_scaling = None
):
self.var_Q_circuit = var_Q_circuit
self.var_C_scaling = var_C_scaling
self.num_of_input = num_of_input
self.num_of_output = num_of_output
self.num_of_wires = num_of_wires
self.dev = qml.device('default.qubit', wires = num_of_wires)
def _layer(self, W):
qml.CNOT(wires=[0, 1])
qml.CNOT(wires=[1, 2])
qml.CNOT(wires=[2, 3])
qml.CNOT(wires=[3, 4])
qml.CNOT(wires=[4, 5])
qml.CNOT(wires=[5, 6])
qml.CNOT(wires=[6, 7])
qml.CNOT(wires=[7, 8])
qml.CNOT(wires=[8, 9])
qml.CNOT(wires=[9, 0])
qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)
qml.Rot(W[4, 0], W[4, 1], W[4, 2], wires=4)
qml.Rot(W[5, 0], W[5, 1], W[5, 2], wires=5)
qml.Rot(W[6, 0], W[6, 1], W[6, 2], wires=6)
qml.Rot(W[7, 0], W[7, 1], W[7, 2], wires=7)
qml.Rot(W[8, 0], W[8, 1], W[8, 2], wires=8)
qml.Rot(W[9, 0], W[9, 1], W[9, 2], wires=9)
def circuit(self, angles):
@qml.qnode(self.dev, interface='torch')
def _circuit(var_Q_circuit, angles):
qml.templates.embeddings.AmplitudeEmbedding(angles, wires=list(range(self.num_of_wires)))
weights = var_Q_circuit
for W in weights:
self._layer(W)
return [qml.expval(qml.PauliZ(k)) for k in range(self.num_of_output)]
return _circuit(self.var_Q_circuit, angles)
def forward(self, angles):
print(angles)
angles = self.var_C_scaling * angles
print(angles)
angles = angles / torch.clamp(torch.sqrt(torch.sum(angles ** 2)), min = 1e-9)
raw_output = self.circuit(angles)
m = nn.Softmax(dim=0)
clamp = 1e-9 * torch.ones(self.num_of_output).type(dtype).to(device)
normalized_output = torch.max(raw_output, clamp)
output = m(normalized_output)
return output
def cost(VQC, X, Y):
loss = nn.CrossEntropyLoss()
output = loss(torch.stack([VQC.forward(item) for item in X]), Y)
print("LOSS AVG: ",output)
return output
def main(batch_size, epoch_num):
num_of_input = 10
num_of_output = 2
num_layers = 4
num_qubits = 10
var_C_scaling = Variable(torch.ones(1024, dtype = torch.double), requires_grad=True)
var_Q_circuit = Variable(torch.tensor(0.01 * np.random.randn(num_layers, num_qubits, 3), device=device).type(dtype), requires_grad=True)
vqc = VQC(num_of_input = num_of_input, num_of_output = num_of_output, num_of_wires = num_qubits, var_Q_circuit = var_Q_circuit, var_C_scaling = var_C_scaling)
params = [var_C_scaling, var_Q_circuit]
opt = torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)
x_for_train = torch.tensor(np.random.randn(10, 1024), device=device).type(dtype)
y_for_train = torch.tensor(np.zeros(10), device=device).type(torch.LongTensor)
x_for_train = x_for_train.type(dtype)
opt.zero_grad()
loss = cost(VQC = vqc, X = x_for_train, Y = y_for_train)
loss.backward()
opt.step()
if __name__ == '__main__':
main(batch_size = 10, epoch_num = 100)
@ycchen1989 I think I've bumped into this problem before as well. What I think is happening is that PennyLane is trying to optimize the angles
argument to _circuit
because it's not passed as a keyword argument. Changing angles
to a keyword parameter should fix it:
def circuit(self, angles):
@qml.qnode(self.dev, interface='torch')
def _circuit(var_Q_circuit, angles=None):
qml.templates.embeddings.AmplitudeEmbedding(angles, wires=list(range(self.num_of_wires)))
weights = var_Q_circuit
for W in weights:
self._layer(W)
return [qml.expval(qml.PauliZ(k)) for k in range(self.num_of_output)]
return _circuit(self.var_Q_circuit, angles=angles)
It's documented here.
But if i set the angle to keyword argument, the classical scaling part won't be updated by gradient descent.
I'm not sure then, I managed to get mixed classical and quantum parameter optimization working by extending nn.Module
and defining self.quantum_parameters = torch.nn.Parameter(self._generate_quantum_parameters())
in the constructor. I then just initialize the optimizer as optim.SGD(model.parameters(), ...)
.
@ycchen1989 first of all, thanks for your patience, and thanks @soudy for pointing out that a keyword argument would solve the problem.
@ycchen1989, I think I can reproduce your problem with the following minimum working example (please reduce your code in future to something as small :) ):
import pennylane as qml
import numpy as np
from pennylane.templates import AmplitudeEmbedding
dev = qml.device('default.qubit', wires=2)
features = np.array([1/2, 1 / 2, 1/2, 1/2])
@qml.qnode(dev)
def circuit(f):
AmplitudeEmbedding(features=f, wires=range(2))
return qml.expval(qml.PauliZ(0))
g = qml.grad(circuit, argnum=[0])
g(features)
The issue is that computing gradients with respect to the features of AmplitudeEmbedding
is not working.
To give some background, while for most embeddings (including AngleEmbedding
, for example), the features can be "learnt", for others (i.e. BasisEmbedding
) this is theoretically not possible. AmplitudeEmbedding
lies somewhere in between the two: It is theoretically possible for some implementations.
In fact, the current implementation should support gradients. The template calls the operation QubitStateVector
, which at the moment always calls the state preparation template MottonnenStatePreparation
which is a rather convoluted quantum algorithm. I will try to track down where gradient calculation fails, and maybe I can make it work.
Also, we only recently overhauled the templates, and are still working on updating the documentation. It will be made clear in the description of each template how and if differentiation works with each argument. Sorry for not having this up sooner!
I hope this helps to understand the issue. Thanks for making me aware of it.
Another Update: Looking deeper into this, MottonnenStatePreparation
uses a lot of postprocessing to compute angles and gates for the state preparation circuit. It is very unlikely that we can make this differentiable (and even if we did, it would involve a huge computational graph).
To summarize, AmplitudeEmbedding
circuits are under the hood so complex that computing gradients with respect to features is not feasible for now.
If you code it up yourself, I recommend actually doing all preprocessing in a classical node (i.e. a normal python function), and then have the quantum node just use the calculated gate parameters.
I will make it a lot clearer in the documentation, and throw an error if a user tries to feed features as positional arguments to a qnode!
Hello, I reviewed some of my previous codes, I found that I can do the gradient as I mentioned in this post with version 0.4.0 pennylane!
Thanks for the update @ycchen1989! At some point since v0.4.0, the gradient computation of AmplitudeEmbedding
must have broken. Does the error message return for v0.5.0?
It can only run successfully on 0.4.0. I tried on 0.5.0 but failed.
Closing this, since the warning has been added to the documentation, and differentiating AmplitudeEmbedding's feature input is no longer intended behaviour.
@mariaschuld @josh146 Thanks for the answers so far. If one wants to stack up a quantum layer after a classical layer, the output of the classical layer (which is a function of its weights and therefore we need to track its gradient) is going to be the input of the quantum layer. What is the procedure then? Because as far as the discussion goes, these values cannot be encoded into the amplitudes of a quantum layer. For making my question more clear, imagine we are classifying Mnist dataset, pytorch is being used, device is "default.qubit" and the interface="torch". we have the output of a convolution layer as (1 1 16 16) meaning there is one batch of one channel of 1616 pixels. This output can be flatten into (1, 256). What is the best way to encode these 256 tunable variables into a quantum layer parameters?
Hey @mamadpierre!
Almost all other embeddings other than AmplitudeEmbedding
are differentiable, you seem to have picked the one that is a bit more tricky :) And there are theoretical reasons for this:
AmplitudeEmbedding
is in some sense very special, because it first needs to parse a classical input vector into a large number of angles and gate sequences to implement the arbitrary state preparation. In other words, while the maths is easy, the implementation is non-trivial. QubitStateVector
), but this is what makes things non-differentiable on some devices.AmplitudeEmbedding
maps the output of a classical layer to a quantum state that is identical (up to normalisation) to this output. The quantum circuit after the embedding then applies a unitary transformation on this output. In many cases you could therefore just train a linear layer of a neural net and get something a lot more powerful (since you are not limited to unitary matrices).In short: unless you have a good theoretical reason to use this embedding (and there are some, I agree), maybe you want to choose another one?
Having said all that, if you have a good theoretical reason to use amplitude encoding, you could try to use the template MottonnenStatePreparation instead, which is what AmplitudeEmbedding
calls on hardware devices where we cannot apply our non-differentiable hack anyways. It is just an arbitrary state preparation routine. As far as I know, the template is differentiable.
I hope this helps?
Thanks for your time. Your thoughts are useful. As you said we are away from the hardware efficient implementation of the method. The reason one may want to use AmplitudeEmbedding
is its exponential reduction in the number of qubits being used.
Yes, that's true. But you'd also get an exponential reduction in the qubits used with other embeddings. As an extreme example, if you encode all inputs into a single qubit, you reduce the number of qubits to a constant! The crux for ML is whether this is actually inducing an interesting representation of your features, which in the case of amplitude encoding is questionable...
Of course I don't know your application, so don't let that detain you. Just good to think about what actually happens with the data, quantum computer or not :)
Maybe I am missing something. but I don't think we can. As mentioned in the qml.templates documents (here), other embeddings need same order of qubits as the features of the data. If you encode all the features into one qubit, via for instance AngleEmbedding, then you have many angle rotations upon each other, making the features dependent of each other. But if you want an angle per feature of the data to learn a representation of it, then n~N
. Maybe, I am totally wrong.
BTW, my application is hybrid Classic-Quantum ML, I want to see whether any quantum circuit can help in the middle of the way to the process of learning.
Hey @mamadpierre, no, you are totally right: If you want one feature per qubit (which most of PennyLane's templates assume) then you need n ~ N
. My comment was more for argument's sake: No one stops you from writing your own template which uses a logarithmic number of qubits in the size of the data - AmplitudeEmbedding has no monopoly claims on this, and it is doing something very trivial to the original features.
I strongly advise treating the embedding not as something to pick blindly, but to be extremely aware of how the quantum state depends on the original features - this may explain a lot of the results you see in the experiments! It may be also a lot more valueable to collect good data on how different embeddings behave, rather than claiming some improvements over randomly chosen classical architectures where the fairness of the comparison is necessarily very hard to judge.
An interesting angle is also in this paper, where everything is encoded into one qubit, but with learnable rotations in between. Another trainable embedding is PennyLane's QAOAEmbedding.
If you find interesting circuits to embedd data, feel free to make a PR and add them to PennyLane. :)
Issue description
Description of the issue - include code snippets and screenshots here if relevant. You may use the following template below
Expected behavior: (What you expect to happen) I placed a pytorch classical node before the quantum node. In the classical node, there is just a scaling on the input, while the scaling parameters are subject to optimization process. The scaled classical data are then sent into the quantum node.
Actual behavior: (What actually happens) In the loss.backward() process, however, the error message says that the parameters cannot be differentiated. It seems that the qnode cannot handle input vectors which are already carry torch gradient information? If I change the angle parameter in the qnode into a keyword argument, the error message disappears, of course, the scaling parameters do not update!
Source code and tracebacks
Please include any additional code snippets and error tracebacks related to the issue here.
The classical node:
The quantum node:
The initial scaling parameters:
The error messages:
Additional information
The input dim is 1024 which is to be encoded into 10 qubit quantum amplitudes.
Any additional information, configuration or data that might be necessary to reproduce the issue.