Classical preprocessing before Amplitude Encoding

ycchen1989 commented 5 years ago

Issue description

Description of the issue - include code snippets and screenshots here if relevant. You may use the following template below

Expected behavior: (What you expect to happen) I placed a pytorch classical node before the quantum node. In the classical node, there is just a scaling on the input, while the scaling parameters are subject to optimization process. The scaled classical data are then sent into the quantum node.
Actual behavior: (What actually happens) In the loss.backward() process, however, the error message says that the parameters cannot be differentiated. It seems that the qnode cannot handle input vectors which are already carry torch gradient information? If I change the angle parameter in the qnode into a keyword argument, the error message disappears, of course, the scaling parameters do not update!

Source code and tracebacks

Please include any additional code snippets and error tracebacks related to the issue here.

The classical node:

class ClassicalScaling:
    def __init__(self, var_Q_circuit):
        self.var_Q_circuit = var_Q_circuit
    def forward(self, angles):
        return self.var_Q_circuit * angles

The quantum node:

class VQC:
    def __init__(
            self,
            num_of_input= 10,
            num_of_output= 10,
            num_of_wires = 10,
            var_Q_circuit = None,
            var_Q_bias = None):

        self.var_Q_circuit = var_Q_circuit
        self.var_Q_bias = var_Q_bias
        self.num_of_input = num_of_input
        self.num_of_output = num_of_output
        self.num_of_wires = num_of_wires
        self.dev = qml.device('default.qubit', wires = num_of_wires)

    def circuit(self, angles):
        @qml.qnode(self.dev, interface='torch')
        def _circuit(var_Q_circuit, angles):
            qml.QubitStateVector(angles, wires=list(range(self.num_of_wires)))
            weights = var_Q_circuit
            for W in weights:
                self._layer(W)
            return [qml.expval.PauliZ(k) for k in range(self.num_of_output)]

        return _circuit(self.var_Q_circuit, angles)

    def _forward(self, angles):
        angles = angles / torch.clamp(torch.sqrt(torch.sum(angles ** 2)), min = 1e-9)
        raw_output = self.circuit(angles)

        return raw_output

    def forward(self, angles):
        fw = self._forward(angles)
        return fw

The initial scaling parameters:

var_Q_circuit = Variable(torch.ones(1024, dtype = torch.double), requires_grad=True)

The error messages:

  File "/Users/ycchen/Desktop/Programming_Exercise/QISKIT_EXERCISE/PENNYLANE/PyTorchAcceleration/TorchPennyLane/lib/python3.7/site-packages/pennylane/qnode.py", line 875, in jacobian
    raise ValueError('Cannot differentiate wrt parameter(s) {}.'.format(bad))
ValueError: Cannot differentiate wrt parameter(s) {240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145, 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158, 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197, 1198, 1199, 1200, 1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213, 1214, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1226, 1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249, 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1261, 1262, 1263}.

Additional information

The input dim is 1024 which is to be encoded into 10 qubit quantum amplitudes.

Any additional information, configuration or data that might be necessary to reproduce the issue.

mariaschuld commented 5 years ago

Thanks for reporting @ycchen1989 .

Would you mind also posting how you "run" VQE and "ClassicalScaling" exactly, i.e. in what way you invoke the loss.backward() statement?

At the moment your code only shows the two classes and the initial parameters. It would be really helpful to have a minimum code example that reproduces the error when I run it.

Thanks!

ycchen1989 commented 5 years ago

minimum code example:

import os
import argparse

import pennylane as qml
from pennylane import numpy as np
# import numpy as np
# from pennylane.optimize import NesterovMomentumOptimizer

import matplotlib.pyplot as plt
from datetime import datetime

import torch
import torch.nn as nn 
from torch.autograd import Variable

import pickle

dtype = torch.cuda.DoubleTensor if torch.cuda.is_available() else torch.DoubleTensor
device = 'cuda' if torch.cuda.is_available() else 'cpu'

###

class ClassicalScaling:
    def __init__(self, var_Q_circuit):
        self.var_Q_circuit = var_Q_circuit
    def forward(self, angles):
        return self.var_Q_circuit * angles

class VQC:
    def __init__(
            self,
            num_of_input= 10,
            num_of_output= 2,
            num_of_wires = 10,
            var_Q_circuit = None,
            var_Q_bias = None):

        self.var_Q_circuit = var_Q_circuit
        self.var_Q_bias = var_Q_bias
        self.num_of_input = num_of_input
        self.num_of_output = num_of_output
        self.num_of_wires = num_of_wires
        self.dev = qml.device('default.qubit', wires = num_of_wires)

    def _layer(self, W):
        """ Single layer of the variational classifier.

        Args:
            W (array[float]): 2-d array of variables for one layer

        """

        # W = W.numpy()

        qml.CNOT(wires=[0, 1])
        qml.CNOT(wires=[1, 2])
        qml.CNOT(wires=[2, 3])
        qml.CNOT(wires=[3, 4])
        qml.CNOT(wires=[4, 5])
        qml.CNOT(wires=[5, 6])
        qml.CNOT(wires=[6, 7])
        qml.CNOT(wires=[7, 8])
        qml.CNOT(wires=[8, 9])
        # Another CNOT which is not in MNIST experiments!
        qml.CNOT(wires=[9, 0])

        qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
        qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
        qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
        qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)
        qml.Rot(W[4, 0], W[4, 1], W[4, 2], wires=4)
        qml.Rot(W[5, 0], W[5, 1], W[5, 2], wires=5)
        qml.Rot(W[6, 0], W[6, 1], W[6, 2], wires=6)
        qml.Rot(W[7, 0], W[7, 1], W[7, 2], wires=7)
        qml.Rot(W[8, 0], W[8, 1], W[8, 2], wires=8)
        qml.Rot(W[9, 0], W[9, 1], W[9, 2], wires=9)

    def circuit(self, angles):
        @qml.qnode(self.dev, interface='torch')
        def _circuit(var_Q_circuit, angles):
            qml.QubitStateVector(angles, wires=list(range(self.num_of_wires)))
            weights = var_Q_circuit
            for W in weights:
                self._layer(W)
            return [qml.expval.PauliZ(k) for k in range(self.num_of_output)]

        return _circuit(self.var_Q_circuit, angles)

    def _forward(self, angles):
        angles = angles / torch.clamp(torch.sqrt(torch.sum(angles ** 2)), min = 1e-9)
        raw_output = self.circuit(angles)

        m = nn.Softmax(dim=0)

        clamp = 1e-9 * torch.ones(self.num_of_output).type(dtype).to(device)

        normalized_output = torch.max(raw_output, clamp)

        output = m(normalized_output)

        return output

    def forward(self, angles):
        fw = self._forward(angles)
        return fw

### 

def lost_function_cross_entropy(labels, predictions):
    ## numpy array
    loss = nn.CrossEntropyLoss()
    output = loss(predictions, labels)
    print("LOSS AVG: ",output)
    return output

def cost(VQC, X, Y):
    """Cost (error) function to be minimized."""

    # predictions = torch.stack([variational_classifier(var_Q_circuit = var_Q_circuit, var_Q_bias = var_Q_bias, angles=item) for item in X])

    ## This method still not fully use the CPU resource...
    loss = nn.CrossEntropyLoss()
    output = loss(torch.stack([VQC.forward(item) for item in X]), Y)
    print("LOSS AVG: ",output)
    return output

def train_epoch(opt, VQC, X, Y, batch_size):
    losses = []
    for i in range(5):
        batch_index = np.random.randint(0, len(X), (batch_size, ))
        X_train_batch = X[batch_index]
        Y_train_batch = Y[batch_index]
        # opt.step(closure)
        opt.zero_grad()
        print("CALCULATING LOSS...")
        loss = cost(VQC = VQC, X = X_train_batch, Y = Y_train_batch)
        print("BACKWARD..")
        loss.backward()
        losses.append(loss.data.cpu().numpy())
        opt.step()
    #       print("LOSS IN CLOSURE: ", loss)
        print("FINISHED OPT.")
        # print("CALCULATING PREDICTION.")
    losses = np.array(losses)
    return losses.mean()

class stackedCircuit:
    def __init__(self, scaling_part, vqc):
        self.scaling_part = scaling_part
        self.vqc = vqc

    def forward(self, single_item):
        res_temp = self.scaling_part.forward(single_item)
        res_temp = self.vqc.forward(res_temp)
        return res_temp

def main(batch_size, epoch_num):
    var_C_scaling = Variable(torch.ones(1024, dtype = torch.double), requires_grad=True)
    scaling_part = ClassicalScaling(var_Q_circuit = var_C_scaling)

    num_of_input = 10
    num_of_output = 10
    num_layers = 2
    num_qubits = 10

    var_Q_circuit = Variable(torch.tensor(0.01 * np.random.randn(num_layers, num_qubits, 3), device=device).type(dtype), requires_grad=True)
    vqc = VQC(num_of_input = num_of_input, num_of_output = num_of_output, num_of_wires = num_qubits, var_Q_circuit = var_Q_circuit, var_Q_bias = None)

    stacked = stackedCircuit(scaling_part, vqc)

    params = [var_C_scaling, var_Q_circuit]

    opt = torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)

    x_for_train = torch.tensor(np.random.randn(10000, 1024), device=device).type(dtype)
    y_for_train = torch.tensor(np.zeros(10000), device=device).type(torch.LongTensor)

    x_for_train = x_for_train.type(dtype)

    for it in range(epoch_num):
        # Need to save data
        avg_loss_in_epoch = train_epoch(opt, stacked, x_for_train, y_for_train, batch_size)

if __name__ == '__main__':

    main(batch_size = 10, epoch_num = 100)

mariaschuld commented 5 years ago

Thanks @ycchen1989 !

I see that in your code you using a rather old version of PennyLane. Would you mind installing the latest one first and see if it resolves the issue? That might save me the time it takes to turn your code into a minimum example and start debugging.

Also check the latest templates functions in the API, which explains what parts are differentiable and which ones are not.

PS: For future reference, a minimum working example usually consists of only a few lines of code needed to reproduce the issue. That allows us to respond a lot quicker and better!

mariaschuld commented 5 years ago

Note, tf you run your code with the latest PennyLane version, you will have to change qml.expval.PauliZ(k) to qml.expval(qml.PauliZ(k)), which is the new signature

ycchen1989 commented 5 years ago

Thanks for replying. I just install the latest version of pennylane(v0.7) and modify the expectation value to

[qml.expval(qml.PauliZ(k)) for k in range(self.num_of_output)]

and also change the

qml.QubitStateVector(angles, wires=list(range(self.num_of_wires)))

to

qml.templates.embeddings.AmplitudeEmbedding(angles, wires=list(range(self.num_of_wires)))

but the error messages persist.

mariaschuld commented 5 years ago

Thanks for trying, I will have a proper look asap then.

If you can shorten the example to a few lines I will be much faster in helping you, and I would be extremely grateful. It is difficult to dig through 200 lines of code that I have not written...

ycchen1989 commented 5 years ago

Thanks for replying. I tried to shorten the code to about 100 lines. Hope it helps.

import pennylane as qml
from pennylane import numpy as np

import matplotlib.pyplot as plt
from datetime import datetime

import torch
import torch.nn as nn 
from torch.autograd import Variable

dtype = torch.cuda.DoubleTensor if torch.cuda.is_available() else torch.DoubleTensor
device = 'cuda' if torch.cuda.is_available() else 'cpu'

###
class VQC:
    def __init__(
            self,
            num_of_input= 10,
            num_of_output= 2,
            num_of_wires = 10,
            var_Q_circuit = None,
            var_C_scaling = None
            ):

        self.var_Q_circuit = var_Q_circuit
        self.var_C_scaling = var_C_scaling
        self.num_of_input = num_of_input
        self.num_of_output = num_of_output
        self.num_of_wires = num_of_wires
        self.dev = qml.device('default.qubit', wires = num_of_wires)

    def _layer(self, W):

        qml.CNOT(wires=[0, 1])
        qml.CNOT(wires=[1, 2])
        qml.CNOT(wires=[2, 3])
        qml.CNOT(wires=[3, 4])
        qml.CNOT(wires=[4, 5])
        qml.CNOT(wires=[5, 6])
        qml.CNOT(wires=[6, 7])
        qml.CNOT(wires=[7, 8])
        qml.CNOT(wires=[8, 9])
        qml.CNOT(wires=[9, 0])

        qml.Rot(W[0, 0], W[0, 1], W[0, 2], wires=0)
        qml.Rot(W[1, 0], W[1, 1], W[1, 2], wires=1)
        qml.Rot(W[2, 0], W[2, 1], W[2, 2], wires=2)
        qml.Rot(W[3, 0], W[3, 1], W[3, 2], wires=3)
        qml.Rot(W[4, 0], W[4, 1], W[4, 2], wires=4)
        qml.Rot(W[5, 0], W[5, 1], W[5, 2], wires=5)
        qml.Rot(W[6, 0], W[6, 1], W[6, 2], wires=6)
        qml.Rot(W[7, 0], W[7, 1], W[7, 2], wires=7)
        qml.Rot(W[8, 0], W[8, 1], W[8, 2], wires=8)
        qml.Rot(W[9, 0], W[9, 1], W[9, 2], wires=9)

    def circuit(self, angles):
        @qml.qnode(self.dev, interface='torch')
        def _circuit(var_Q_circuit, angles):
            qml.templates.embeddings.AmplitudeEmbedding(angles, wires=list(range(self.num_of_wires)))
            weights = var_Q_circuit
            for W in weights:
                self._layer(W)
            return [qml.expval(qml.PauliZ(k)) for k in range(self.num_of_output)]
        return _circuit(self.var_Q_circuit, angles)

    def forward(self, angles):
        print(angles)
        angles = self.var_C_scaling * angles
        print(angles)
        angles = angles / torch.clamp(torch.sqrt(torch.sum(angles ** 2)), min = 1e-9)
        raw_output = self.circuit(angles)
        m = nn.Softmax(dim=0)
        clamp = 1e-9 * torch.ones(self.num_of_output).type(dtype).to(device)
        normalized_output = torch.max(raw_output, clamp)
        output = m(normalized_output)
        return output

def cost(VQC, X, Y):
    loss = nn.CrossEntropyLoss()
    output = loss(torch.stack([VQC.forward(item) for item in X]), Y)
    print("LOSS AVG: ",output)
    return output

def main(batch_size, epoch_num):
    num_of_input = 10
    num_of_output = 2
    num_layers = 4
    num_qubits = 10
    var_C_scaling = Variable(torch.ones(1024, dtype = torch.double), requires_grad=True)
    var_Q_circuit = Variable(torch.tensor(0.01 * np.random.randn(num_layers, num_qubits, 3), device=device).type(dtype), requires_grad=True)
    vqc = VQC(num_of_input = num_of_input, num_of_output = num_of_output, num_of_wires = num_qubits, var_Q_circuit = var_Q_circuit, var_C_scaling = var_C_scaling)

    params = [var_C_scaling, var_Q_circuit]

    opt = torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)

    x_for_train = torch.tensor(np.random.randn(10, 1024), device=device).type(dtype)
    y_for_train = torch.tensor(np.zeros(10), device=device).type(torch.LongTensor)

    x_for_train = x_for_train.type(dtype)

    opt.zero_grad()
    loss = cost(VQC = vqc, X = x_for_train, Y = y_for_train)
    loss.backward()
    opt.step()

if __name__ == '__main__':
    main(batch_size = 10, epoch_num = 100)

soudy commented 5 years ago

@ycchen1989 I think I've bumped into this problem before as well. What I think is happening is that PennyLane is trying to optimize the angles argument to _circuit because it's not passed as a keyword argument. Changing angles to a keyword parameter should fix it:

def circuit(self, angles):
    @qml.qnode(self.dev, interface='torch')
    def _circuit(var_Q_circuit, angles=None):
            qml.templates.embeddings.AmplitudeEmbedding(angles, wires=list(range(self.num_of_wires)))
            weights = var_Q_circuit
            for W in weights:
                    self._layer(W)
            return [qml.expval(qml.PauliZ(k)) for k in range(self.num_of_output)]
    return _circuit(self.var_Q_circuit, angles=angles)

It's documented here.

ycchen1989 commented 5 years ago

But if i set the angle to keyword argument, the classical scaling part won't be updated by gradient descent.

soudy commented 5 years ago

I'm not sure then, I managed to get mixed classical and quantum parameter optimization working by extending nn.Module and defining self.quantum_parameters = torch.nn.Parameter(self._generate_quantum_parameters()) in the constructor. I then just initialize the optimizer as optim.SGD(model.parameters(), ...).

mariaschuld commented 4 years ago

@ycchen1989 first of all, thanks for your patience, and thanks @soudy for pointing out that a keyword argument would solve the problem.

@ycchen1989, I think I can reproduce your problem with the following minimum working example (please reduce your code in future to something as small :) ):

import pennylane as qml
import numpy as np
from pennylane.templates import AmplitudeEmbedding

dev = qml.device('default.qubit', wires=2)
features = np.array([1/2, 1 / 2, 1/2, 1/2])

@qml.qnode(dev)
def circuit(f):
    AmplitudeEmbedding(features=f, wires=range(2))
    return qml.expval(qml.PauliZ(0))

g = qml.grad(circuit, argnum=[0])
g(features)

The issue is that computing gradients with respect to the features of AmplitudeEmbedding is not working.

To give some background, while for most embeddings (including AngleEmbedding, for example), the features can be "learnt", for others (i.e. BasisEmbedding) this is theoretically not possible. AmplitudeEmbedding lies somewhere in between the two: It is theoretically possible for some implementations.

In fact, the current implementation should support gradients. The template calls the operation QubitStateVector, which at the moment always calls the state preparation template MottonnenStatePreparation which is a rather convoluted quantum algorithm. I will try to track down where gradient calculation fails, and maybe I can make it work.

Also, we only recently overhauled the templates, and are still working on updating the documentation. It will be made clear in the description of each template how and if differentiation works with each argument. Sorry for not having this up sooner!

I hope this helps to understand the issue. Thanks for making me aware of it.

mariaschuld commented 4 years ago

Another Update: Looking deeper into this, MottonnenStatePreparation uses a lot of postprocessing to compute angles and gates for the state preparation circuit. It is very unlikely that we can make this differentiable (and even if we did, it would involve a huge computational graph).

To summarize, AmplitudeEmbedding circuits are under the hood so complex that computing gradients with respect to features is not feasible for now.

If you code it up yourself, I recommend actually doing all preprocessing in a classical node (i.e. a normal python function), and then have the quantum node just use the calculated gate parameters.

I will make it a lot clearer in the documentation, and throw an error if a user tries to feed features as positional arguments to a qnode!

ycchen1989 commented 4 years ago

Hello, I reviewed some of my previous codes, I found that I can do the gradient as I mentioned in this post with version 0.4.0 pennylane!

josh146 commented 4 years ago

Thanks for the update @ycchen1989! At some point since v0.4.0, the gradient computation of AmplitudeEmbedding must have broken. Does the error message return for v0.5.0?

ycchen1989 commented 4 years ago

It can only run successfully on 0.4.0. I tried on 0.5.0 but failed.

mariaschuld commented 4 years ago

Closing this, since the warning has been added to the documentation, and differentiating AmplitudeEmbedding's feature input is no longer intended behaviour.

mamadpierre commented 4 years ago

@mariaschuld @josh146 Thanks for the answers so far. If one wants to stack up a quantum layer after a classical layer, the output of the classical layer (which is a function of its weights and therefore we need to track its gradient) is going to be the input of the quantum layer. What is the procedure then? Because as far as the discussion goes, these values cannot be encoded into the amplitudes of a quantum layer. For making my question more clear, imagine we are classifying Mnist dataset, pytorch is being used, device is "default.qubit" and the interface="torch". we have the output of a convolution layer as (1 1 16 16) meaning there is one batch of one channel of 1616 pixels. This output can be flatten into (1, 256). What is the best way to encode these 256 tunable variables into a quantum layer parameters?

mariaschuld commented 4 years ago

Hey @mamadpierre!

Almost all other embeddings other than AmplitudeEmbedding are differentiable, you seem to have picked the one that is a bit more tricky :) And there are theoretical reasons for this:

AmplitudeEmbedding is in some sense very special, because it first needs to parse a classical input vector into a large number of angles and gate sequences to implement the arbitrary state preparation. In other words, while the maths is easy, the implementation is non-trivial.
The embedding is also far from hardware efficient, and you will naturally get very long circuits that will take long to train for anything but very few qubits. This is why we allow simulators to "hack" the circuit, and replace all those gates by just a single procedure that initialises the state vector in the desired way (using QubitStateVector), but this is what makes things non-differentiable on some devices.
Lastly, as you know AmplitudeEmbedding maps the output of a classical layer to a quantum state that is identical (up to normalisation) to this output. The quantum circuit after the embedding then applies a unitary transformation on this output. In many cases you could therefore just train a linear layer of a neural net and get something a lot more powerful (since you are not limited to unitary matrices).

In short: unless you have a good theoretical reason to use this embedding (and there are some, I agree), maybe you want to choose another one?

Having said all that, if you have a good theoretical reason to use amplitude encoding, you could try to use the template MottonnenStatePreparation instead, which is what AmplitudeEmbedding calls on hardware devices where we cannot apply our non-differentiable hack anyways. It is just an arbitrary state preparation routine. As far as I know, the template is differentiable.

I hope this helps?

mamadpierre commented 4 years ago

Thanks for your time. Your thoughts are useful. As you said we are away from the hardware efficient implementation of the method. The reason one may want to use AmplitudeEmbedding is its exponential reduction in the number of qubits being used.

mariaschuld commented 4 years ago

Yes, that's true. But you'd also get an exponential reduction in the qubits used with other embeddings. As an extreme example, if you encode all inputs into a single qubit, you reduce the number of qubits to a constant! The crux for ML is whether this is actually inducing an interesting representation of your features, which in the case of amplitude encoding is questionable...

Of course I don't know your application, so don't let that detain you. Just good to think about what actually happens with the data, quantum computer or not :)

mamadpierre commented 4 years ago

Maybe I am missing something. but I don't think we can. As mentioned in the qml.templates documents (here), other embeddings need same order of qubits as the features of the data. If you encode all the features into one qubit, via for instance AngleEmbedding, then you have many angle rotations upon each other, making the features dependent of each other. But if you want an angle per feature of the data to learn a representation of it, then n~N. Maybe, I am totally wrong. BTW, my application is hybrid Classic-Quantum ML, I want to see whether any quantum circuit can help in the middle of the way to the process of learning.

mariaschuld commented 4 years ago

Hey @mamadpierre, no, you are totally right: If you want one feature per qubit (which most of PennyLane's templates assume) then you need n ~ N. My comment was more for argument's sake: No one stops you from writing your own template which uses a logarithmic number of qubits in the size of the data - AmplitudeEmbedding has no monopoly claims on this, and it is doing something very trivial to the original features.

I strongly advise treating the embedding not as something to pick blindly, but to be extremely aware of how the quantum state depends on the original features - this may explain a lot of the results you see in the experiments! It may be also a lot more valueable to collect good data on how different embeddings behave, rather than claiming some improvements over randomly chosen classical architectures where the fairness of the comparison is necessarily very hard to judge.

An interesting angle is also in this paper, where everything is encoded into one qubit, but with learnable rotations in between. Another trainable embedding is PennyLane's QAOAEmbedding.

If you find interesting circuits to embedd data, feel free to make a PR and add them to PennyLane. :)

PennyLaneAI / pennylane