Improve UI for circuits, expvals, variances and samples

After a discussion with @josh146, I would want to start a (admittedly lengthy) discussion on the implementation of circuits and expvals. Please express your thoughts on this.

Status Quo

Currently, a circuit with an application will look something like this:

@qml.qnode(dev)
def circuit(var):
    qml.RX(var[0], wires=0)
    qml.RY(var[1], wires=1)

    return [qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(1))]

where using the circuit happens like this

def objective(var, target):
    return ((circuit(var) - target)**2).sum()

Problems (or features)

With the introduction of var (#240) the user can now do things like combining variances and expvals

@qml.qnode(dev)
def circuit(var):
    qml.RX(var[0], wires=0)
    qml.RY(var[1], wires=1)

    return [qml.expval(qml.PauliZ(0)), qml.var(qml.PauliZ(1))]

which is kind of a feature. But right now we limit ourselfs to one observable per wire although we can usually compute variance and expectation values at the same time.

With the coming introduction of sample (#256), there is another problem. Obviously, the user can't combine a sample and expvals/vars in the same output because then we can't convert the output to a numpy array, because the column lengths don't match.

Possible changes

(a)

The first thing we could do to make the structure more coherent is to check in code if the users returns a sample from a QNode, and in this case to ensure that (a) all samples have the same dimension and (b) that there are no expvals or vars that are returned. Concerning the question of combining vars and expvals, one could allow for more return values from the QNode and only allow one expval and one var per wire.

This would be the solution that will cause the least changes to the user interface and will "feel" like the old interface mostly. Also this would not break anything. But I don't like this solution because it is (a) clumsy, (b) tricky to implement and (c) will surely cause some bugs or weird behaviour.

(b)

I would propose to make the following change to the UI: we will separate the circuit description and the circuit evaluation logically. To this end, the new circuit definition will look like the following

@qml.qnode(dev)
def circuit(var):
    qml.RX(var[0], wires=0)
    qml.RY(var[1], wires=1)

    return [qml.measure(qml.PauliZ(0)), qml.measure(qml.PauliZ(1))]

which is just telling that the operators given are intended to be measured and not regular transformations. But there is no talk of expvals and variances yet as they are technically not part of the circuit definition.

Note: If we are on it, I would also propose to implement it either as

    return qml.measure([qml.PauliZ(0), qml.PauliZ(1)])

    return [qml.PauliZ(0), qml.PauliZ(1)]

where the transformations that are returned from the circuit function are implicitly taken as the measured ones.

The usage of it would then be something like

def objective(var, target):
    return (circuit(var).expval() - target)**2

def objective(var, target):
    return (circuit(var).expval - target)**2

which would also allow things like

def objective(var, target):
    expectation = circuit(var).expval()
    variance = circuit(var).var()
    return ((expectation  - target)**2 + .1 * variance**2).sum()

One could also take circuit(var) to behave like circuit(var).expval() in calculations, but I'd rather keep the expectation value explicit.

Possible ways to implement this could be to have circuit(var) to be a QNode which gets now member functions (or members) that return a ExpvalQNode, VarianceQNode or a SampleQNode. In this way we could also separate calculation of gradients for variances from the one intended for expectation values. I think this separation would also make the code cleaner.

Advantages

Cleaner separation of circuit description and circuit evaluation
Outputs always have well defined dimensions
Creating different QNode subclasses could make the code cleaner

Disadvantages

Hugh code change
No (direct) combination of expvals and vars

Open questions

Should expval and co. be functions or members? A function tells: "evaluate the expectation value" where a member tells that the expectation value is a property of the circuit, of which both are somewhat true.
Would this approach break the interface to autograd? How could this be solved if this was the case?
I guess there is some need for lazy evaluation. If the circuit already evaluated once (for example if expval() was called), then we don't want to invoke another evaluation for the calculation of var.

Obviously, the user can't combine a sample and expvals/vars in the same output because then we can't convert the output to a numpy array, because the column lengths don't match.

Note that NumPy does support ragged arrays; with the caveat that they will have np.object datatype:

>>> np.array([[0, 0, 1], [1, 2]])
array([list([0, 0, 1]), list([1, 2])], dtype=object)

(i.e., they will be similar to a Python nested list). Since returning samples will break autodiff anyway, this shouldn't cause any side effects.

From my point of view, I don't actually mind the following:

import pennylane as qml
from pennylane import expval, var, sample

@qml.qnode(dev)
def circuit(var):
    qml.RX(var[0], wires=0)
    qml.RY(var[1], wires=1)
    return expval(qml.PauliZ(0)), var(qml.PauliZ(1)), sample(qml.PauliY(2))

I like:

it is explicit
naturally allows for tensor products of observables, i.e., expval(qml.PauliZ(0), qml.PauliX(1)) (see #212)
for hardware devices, the expectation value and variance of a particular observable on a particular wire are both available for 'free' for a single run, and this can be easily validated.

Things that I don't like:

it is verbose (but this goes hand in hand with being explicit)
in order to change the return type, you need to redefine the qfunc

From the suggested improvements, I think I like this one the most:

@qml.qnode(dev)
def circuit(*args, **kwargs):
    qml.RX(args[0], wires=0)
    qml.RY(args[1], wires=1)
    qml.CNOT(wires=[0, 3])
    qml.CNOT(wires=[0, 2])
    return qml.PauliZ(0)), qml.PauliZ(1), qml.tensor(qml.PauliX(2), qml.Hadamard(3))

(the qml.measure feels superfluous). In terms of calling it, it will likely be quite difficult to implement and retain autodiff support. For autodiff support, the arguments normally need to be passed last, so something like the following will be more likely to work:

ev_res = circuit.expval(*args, **kwargs)
v_res = circuit.var(*args, **kwargs)
res = circuit.samples(*args, **kwargs)

This is the same approach that allow the existing qnode.to_torch() and qnode.to_tfe() methods to work.

I do miss the ability to define a QNode with mixed expectation values and variances. This approach might also make it difficult to ensure that the separate calls to .var and .expval apply to a single cached device run, rather than running the device twice.

Note that NumPy does support ragged arrays

Fair enough! I should have thought of this.

I think this is also what I'm going to implement for now in the case of sampling.

I also think this is the most elegant way of doing it, because it is light in terms of syntax and still quite understandable.

Do I understand correctly that it would have to be

circuit.expval(x)

instead of

circuit(x).expval

to interface properly with autograd?

I'm not very deep into the structure, but wouldn't it work if QNode.evaluate(x) would no longer be a primitive and return some intermediary object, let's call it EvaluatedQNode. EvaluatedQNode contains the result of the evaluation of the circuit ran with parameter x, which should basically be either the result of an analytic run or the samples obtained from the device. EvaluatedQNode then has methods expval, var and sample that lazily compute the desired thing. To be compatible with autograd you would implement that as

@ae.primitive
def _expval(self, *args, **kwargs):
    # Basically we ignore the parameters here because they are only there for autograd

    # We then need to compute the parameters from cached data, here I'm not too sure how
    # exactly do that
    return self.sample.mean()

def expval(self):
     self._expval(self.param) # Here is the param 'x' of the model

I think this would also require a change in the plugin to facilitate the storing of the data, i.e. some sort of lazy evaluation on device level.

I also like Josh's idea. @josh146 do I understand correctly, the user defines what is returned, and can return as much as he/she wants?

@johannesjmeyer Thanks for taking this daunting task on!

@co9olguy told me that there is also talk about allowing the user to define multiple measurements for his circuit, which could be good in case of simulator backends where the whole circuit need not be evaluated twice to obtain multiple measurements per wire. This should definitely be taken into consideration here.

I have some preliminary ideas on how one could implement this:

a)

The user just returns what he's interested in and then we figure out how many runs of the device we need. This would allow something like

def circuit():
    ...
    return qml.expval(qml.PauliX(0)), qml.expval(qml.PauliY(0)), qml.expval(qml.PauliX(1)), qml.expval(PauliZ(0))

Note that we cannot really enforce a special ordering here. In this case we would need three runs on the device. The downside is that the user has no direct control on which measurements are ran together (maybe the user's device has side effects on certain measurements)

b)

The user returns a 2D-array of measurements, like

def circuit():
    ...
    return [[qml.expval(qml.PauliX(0)), qml.expval(qml.PauliX(1))],
            [qml.expval(qml.PauliY(0)), qml.expval(qml.PauliY(1))],
            [qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(1))]]

and use it like

res = circuit(x)
res[0,0] + res[1,0] + res[2,0]

where every row is a measurement.

I tend to prefer b), but actually, we could support both.

This could also be implemented analogously in the approach I proposed. One could also think about giving the user the possibility to return a dictionary, that could be good if the circuit gets large and complicated

def circuit(x):
    ...
    return { "X0" : qml.PauliX(0), "Y0": qml.PauliY(0), "Z0" : qml.PauliZ(0)}

and use it like

circuit(x).expval["X0"] + circuit(x).var["Y0"]

This might be a very vague comment, but from programming another QML library I know that one regrets very quickly if one sets a restricted framework (for example the shape or type of return statements). Especially in a scientifically evolving platform.

In that spirit, is there any possibility the qnode could just return anything? And when objects are generated in a qnode that need gradients, they just become attributes of the object?

Even if it is not, I recommend strongly to opt for the most flexible model possible, even if it is more work to implement...

This might be a very vague comment, but from programming another QML library I know that one regrets very quickly if one sets a restricted framework (for example the shape or type of return statements). Especially in a scientifically evolving platform.

I agree :100:, perhaps ragged NumPy arrays with dtype=object is the best way to go.

I was talking with @mariaschuld yesterday and we're now both convinced that format

@qml.qnode(dev)
def circuit(params, input):
    ...
    return qml.PauliZ(0), qml.PauliZ(1), qml.PauliZ(2)

with usage

    circuit(params, input).expval[0]
    circuit(params, input).var[0]
    circuit(params, input).sample(n)[:,:10]

would be the best [pending that it can be integrated with autograd -- Maria mentioned that maybe @smite could know about this?]. We are not loosing any flexibility in this approach, but we gain readability. Furthermore, it separates circuit definition from post-processing and makes the nature of the result you treat explicit.

We should even be able to make this non-breaking by having circuit(x)[1] default to circuit(x).expval[1].

The only downside I see sofar is that if the user wants expvals of operators 1-10, but is only interested in the var of operator 3 we will probably be calculating too much in the background. But even this could be remedied by using some sort of lazy evaluation.

An example of usage would be

data = ...
def cost(params):
    cumsum = 0

    for (x, t) in data:
        model = circuit(params, x)
        cumsum += np.sum((model.expval - t)**2)

    cumsum += 0.1 * np.sum(model.var**2)

    return cumsum

Of course there would be some open questions on how to exactly implement this.

I think the autograd integration will work, because only certain functions (like Function instances in Torch or primitives for autograd) will make it to the computational graph. In this way we can just construct a wrapping that enables the above notation, similar in spirit to what I proposed earlier.

I cooked up a MWE of such an architecture. In this case I implemented a function expval that has the form proposed above. I also integrated it with Torch via two wrappers, one for I/O and one for the expval itself, which is necessary for the gradients.

import torch 
import torch.autograd as ag
import numpy as np

# Wraps the expval for use in torch autograd
class TorchExpval():
    def __init__(self, expval_instance):
        self.instance = expval_instance
        self.wrapper = TorchWrapper()

        class _Expval(ag.Function):
            @staticmethod
            def forward(ctx, input):
                return self.wrapper.from_numpy(self.instance.eval())

            @staticmethod
            def backward(ctx, grad_output):
                return torch.dot(self.wrapper.from_numpy(self.instance.jacobian()), grad_output)

        self._expval = _Expval()

    def __getitem__(self, key):
        # Here would probably be the canonical place for other wrappers
        return self._expval.apply(self.instance.parameter)[key]

# The core expval
class Expval():
    def __init__(self, base):
        self.base = base
        self.parameter = self.base.parameter_original_type

    def eval(self):
        return self.base.sample.mean(axis=1)

    def jacobian(self):
        return self.base.sample.mean(axis=1) / self.base._parameter

    def __getitem__(self, key):
        return self.eval()[key]

# A wrapper that does nothing
# probably not necessary in a real implementation, here
# autograd would be default
class IdentityWrapper():
    @staticmethod
    def to_numpy(val):
        return val

    @staticmethod
    def from_numpy(val):
        return val

    @staticmethod
    def wrap_expval(expval_instance):
        return expval_instance

# Wrapper functions for Torch
class TorchWrapper():
    @staticmethod
    def to_numpy(val):
        return val.detach().numpy()

    @staticmethod
    def from_numpy(val):
        return torch.from_numpy(val).float()

    @staticmethod
    def wrap_expval(expval_instance):
        return TorchExpval(expval_instance)

# Circuit + specific parameter instances
class EvaluatedCircuit():
    def __init__(self, parameter, wrapper=IdentityWrapper):
        # Make some dummy data
        self.parameter_original_type = parameter
        self._parameter = wrapper.to_numpy(parameter)

        # 'Evaluate' the circuit
        self.sample = self._parameter * np.array([[0.5, 1.0, 1.5],[1.5, 2.0, 2.5]])

        # Expval is a class that overloads the [] operator
        self.expval = wrapper.wrap_expval(Expval(self))

# Circuit definition
class Circuit():
    def __init__(self, interface='vanilla'):
        if interface=='vanilla':
            self.wrapper = IdentityWrapper
        elif interface=='torch':
            self.wrapper = TorchWrapper

    def __call__(self, parameter):
        return EvaluatedCircuit(parameter, wrapper=self.wrapper)

# Vanilla (no gradients)
input = 2.0
circ = Circuit()
output = 2 * circ(input).expval[0]**2 + 3 * circ(input).expval[1]**2

print(input)
print(output)

"""
2.0
56.0
"""

# Torch
input = torch.tensor(2.0, requires_grad=True)
circ = Circuit(interface='torch')

output = 2 * circ(input).expval[0]**2 + 3 * circ(input).expval[1]**2
output.backward()

print(input)
print(output)
print(input.grad)

"""
tensor(2., requires_grad=True)
tensor(56., grad_fn=<AddBackward0>)
tensor(56.)
"""

def rec_explore(root, i = 0):    
    if hasattr(root, '__iter__'):
        for element in root:
            rec_explore(element, i)
    elif not root is None and not type(root) is int:
        print(i, i*' ', root)
        if hasattr(root, 'next_functions'):
            for element in root.next_functions:
                rec_explore(element, i+1)

rec_explore(output.grad_fn)

"""
0  <AddBackward0 object at 0x000001DC5445C160>
1   <MulBackward0 object at 0x000001DC5445C128>
2    <PowBackward0 object at 0x000001DC5445C208>
3     <SelectBackward object at 0x000001DC5445C278>
4      <torch.autograd.function._ExpvalBackward object at 0x000001DC54457248>
5       <AccumulateGrad object at 0x000001DC5445C358>
1   <MulBackward0 object at 0x000001DC5445C048>
2    <PowBackward0 object at 0x000001DC5445C208>
3     <SelectBackward object at 0x000001DC5445C278>
4      <torch.autograd.function._ExpvalBackward object at 0x000001DC54457648>
5       <AccumulateGrad object at 0x000001DC5445C390>
"""

Comparing to the old model, Circuit corresponds to the plain definition of the circuit of the QNode, EvaluatedCircuit is one parametrized instance of the circuit. The actual computational node is actually Expval, which is also what gets differentiated in the end.

Finally got around to reading all the way through. I don't really have any conclusive opinions at the moment, but I do want to make a few general comments:

I seem to remember that autograd gave a big barrier to implementing some of our earlier ideas. The reason some things ended up the way they did in PennyLane is specifically because of what autograd let us do easily (and we could figure out from incomplete docs), and what we struggled to implement. We might want to promote torch as the default interface at some point, but any design decision we make should be validated to be implemented with autograd
I agree with @mariaschuld that we want to remove as many assumptions as possible from our core functions. It is largely for this reason that I would resist significant further changes to the core interface. However, building higher-level objects and abstractions on top of this is fair game.
Following 2, it seems that the best return type of a quantum node is the simplest multidimensional python object: a tuple. The only real "assumption" in that case is that a user may want one or more objects returned. Python users are already familiar with this functionality. There doesn't seem to be any good reason to unnaturally combine things, build ragged arrays, or broadcast things across axes redundantly. Note: it may be that current return types in PL were again dictated by the constraints of autograd when we were initially coding things up.

Closing this issue, as the consensus seems to be the status quo - returning a tuple as defined by the user. We can re-open in the future if anything changes.

PennyLaneAI / pennylane