Support all operations with `param_shift_hessian()`

cvjjm commented 2 years ago

Feature details

It should be possible to support operations with arbitrary grad_recipes in param_shift_hessian() by computing the applicable grad recipe for each entry in the hessian from the grad_recipess of the operations instead of using fixed diag_recipe and off_diag_recipes.

A first and easier improvement would be to at least replace the hard coded list of "supported operations" by code that inspects the grad_recipes of the occuring operations so that newly added operations that are also differentiable according to the standard rule are supported automatically.

Implementation

No response

How important would you say this feature is?

2: Somewhat important. Needed this quarter.

Additional information

No response

josh146 commented 2 years ago

Hey @cvjjm, you are too fast for us 😆 We have been planning this upgrade using the functionality available in the functions generate_multi_shift_rule and generate_shift_rule:

>>> frequencies = qml.gradients.eigvals_to_frequencies((-0.5, 0, 0, 0.5))
>>> qml.gradients.generate_shift_rule(frequencies, order=2, shifts=(np.pi/2, 3 * np.pi/2))
array([[-0.375     ,  0.25      ,  0.25      , -0.125     ],
       [ 0.        , -3.14159265,  3.14159265, -6.28318531]])

>>> f1 = qml.gradients.eigvals_to_frequencies((0.5, -0.5))
>>> f2 = qml.gradients.eigvals_to_frequencies((0.5, 1))
>>> qml.gradients.generate_multi_shift_rule([f1, f2])
array([[ 0.125     , -0.125     , -0.125     ,  0.125     ],
       [ 1.57079633,  1.57079633, -1.57079633, -1.57079633],
       [ 3.14159265, -3.14159265,  3.14159265, -3.14159265]])

Unfortunately, this process has been coincided by a release so is not yet complete.

Note that, to extract parameter-shift Hessians of arbitrary gates, you can use autograd directly:

dev = qml.device("default.qubit", wires=2)

@qml.qnode(dev, diff_method="parameter-shift", max_diff=2)
def circuit(weights):
    qml.RX(0.5, wires=0)
    qml.RX(0.9, wires=1)
    # Following gates have 4-term shifts
    qml.CRX(weights[0], wires=[0, 1])
    qml.CRY(weights[1], wires=[0, 1])
    return qml.expval(qml.PauliZ(1))

weights = np.array([0.6, 0.2], requires_grad=True)
hess = qml.jacobian(qml.grad(circuit))(weights)
print(hess)
[[-0.00424343  0.01212983]
 [ 0.01212983 -0.00424343]]

This has the downside though that more evaluations occur, since autograd will always ask for first derivatives before the second are computed.

Just so I understand more about your use case better - are you looking to use qml.gradients.param_shift_hessian() over qml.jacobian(qml.grad(circuit)) because the first order derivatives are not needed, and result in wasted evaluations?

cvjjm commented 2 years ago

Yes, exactly. Cool that this is on the roadmap.

cvjjm commented 2 years ago

Another thing that could be useful is an option to only compute part of the hessian, say only the diagonal.

josh146 commented 2 years ago

Another thing that could be useful is an option to only compute part of the hessian, say only the diagonal.

Cool idea --- would you say this generalizes, and it would be useful to be able to compute specific elements of the Hessian (similar to a generalization of argnum= for the param_shift transform)?

Or is it mostly just the diagonal you are interested in?

cvjjm commented 2 years ago

I don't know yet what will be useful, so I would say it generalizes :-)

Maybe the user can pass in a "mask" of the same shape as the hessian with boolean entries and then get a hessian with only those entries back?

josh146 commented 2 years ago

Regarding your latter point, @dwierichs has pointed out that we could do something analogous to the Jacobian transform,

>>> qml.gradients.param_shift(circuit, argnum=[0, 3])

That is, provide a list of argnum's that represent the Hessian matrix coordinates in terms of trainable tape parameters:

>>> # compute only the (0, 0), (3, 2), and (1, 3) elements of the Hessian
>>> qml.gradients.param_shift_hessian(circuit, argnum=[[0, 3, 1], [0, 2, 3]])

The nice thing about this approach is that it is maybe more natural, since the index of the trainable tape parameters are non-ambiguous, and directly correspond to circuit shifts.

dwierichs commented 2 years ago

Another option here would be to allow for argnum as either a list or a boolean array: argnum=[1,3,4] => compute Hessian entries [(1, 1), (1, 3), (1,4), (3, 3), (3, 4), (4, 4)] (+symmetric counterparts) argnum=2darray[bool] => Only compute entries as indicated by the mask in argnum.

cvjjm commented 2 years ago

I like both the argnum an the boolean array approach. Any chance that one of those two can still make it into the next release?

dwierichs commented 2 years ago

@cvjjm argnum is going to be implemented in #2319 in the former way. It is close to being merged in, if you want you could also have a look and give feedback on whether this format works for you? :)

cvjjm commented 2 years ago

Will the argnum=2darray[bool] case also be implemented? Only that would allow to compute, say, only the diagonal, right?

dwierichs commented 2 years ago

After an offline discussion, it seems like it really would be nice to support this and not terribly much effort, so it will be done in a separate PR to #2319 .

cvjjm commented 2 years ago

Another thin that is easy to overlook is that the circuit parameters may be a higher dimensional tensor, in which case the meaning of argnum=[1, 5] may not be unambiguous, whereas the hessian and boolean array should then have shape list(params.shape)*2 and it is therefore clear which elements to be calculated.

Finally one has to decide what to return for the entries that were not calculated. None or 0.0?

dwierichs commented 2 years ago

This is a bit of a blind spot of the QNode wrappers, as far as I am aware: When applying param_shift_hessian (and gradient transforms, for that matter) to a tape, argnum will refer to tape parameters, which are always flat. Therefore, argnum allows for very detailed control here. When applying the transforms to a QNode, however, argnum will refer to QNode arguments. This is a much more high-level control and does in particular not allow to compute derivatives with respect to some parameters in an array-valued argument. @josh146 correct me if I'm wrong here :)

The reason for this is not entirely clear to me, but it certainly is more work to allow for a variety of argnum formats, including

int (does argnum=3 refer to the third argument of a QNode, or to the third parameter in a single QNode argument?)
Sequence[int] (does argnum=(2, 3) refer to the second and third argument of a QNode, to the second and third parameter in a single QNode argument, or to the element with indices (2, 3) in an array-valued argument?)
Sequence[Sequence[int]] (argnum have to be provided in the format (idx of QNode arg, idx within QNode arg))

I think the hurdle here is to create an intuitive, user-friendly API that does not require too much interpreting and "checking" whether a certain way of indexing was intended by users, and that does not require complicated, overspecific inputs for simple use cases. An example showcasing that this can get complicated are the nums_frequency and spectra keyword arguments of RotosolveOptimizer, where even simple QNodes require comparably complicated input in the format of dict[str: dict[tuple: int]]

Finally one has to decide what to return for the entries that were not calculated. None or 0.0?

Note that gradients.param_shift_hessian when applied to a tape will output 0 for parameters not marked as trainable, as does gradients.param_shift. For QNodes, the higher-level functions like qml.grad or qml.jacobian return tuples with entries missing that are not marked in argnum, whereas param_shift_hessian, being lower-level, still outputs 0 (of the correct shape).

josh146 commented 2 years ago

@dwierichs nice analysis!

The reason for this is not entirely clear to me

I don't believe there is a definitive reason for this. It is a combination of:

Building up the UI in an iterative manner, and waiting for user feedback before making improvements/changes
Not having a good solution for dealing with gate arguments at the QNode level

One approach could be to do what qml.batch_input does; that is, have an argument that applies to the tape level gate argument indices even when used at the QNode level. I feel like this is relatively easy to explain, even if it might not be 'intuitive' at first sight.

Finally one has to decide what to return for the entries that were not calculated. None or 0.0?

This one is a bit more difficult to standardize, since qml.gradients.param_shift is used internally by the PL core when the higher-level TF/Torch/Autograd/JAX backprop pipelines occur. As such, we must return information that TF/Torch/etc. know how to interpret internally during backpropagation. This may or may not coincide with what users might expect to see.

cvjjm commented 2 years ago

Will the argnum=2darray[bool] case also be implemented? Only that would allow to compute, say, only the diagonal, right?

@dwierichs is there any chance this could still be implemented? I am really interested in the diagonal of the Hessian :-)

josh146 commented 2 years ago

@cvjjm, @dwierichs and I will look into this and see how much work it requires based off the current implementation 🙂

dwierichs commented 2 years ago

@cvjjm just let me know whether #2538 works for you and resolves this issue :)

cvjjm commented 2 years ago

It works! Great work @dwierichs !

cvjjm commented 2 years ago

One minor thing: It does not seem to work when the QNode returns the expval of a Hamiltonian:


  File "/.../pennylane/pennylane/gradients/hessian_transform.py", line 125, in hessian_wrapper
    qhess = _wrapper(*args, **kwargs)
  File "/.../pennylane/pennylane/transforms/batch_transform.py", line 289, in _wrapper
    tapes, processing_fn = self.construct(qnode.qtape, *targs, **tkwargs)
  File "/.../pennylane/pennylane/transforms/batch_transform.py", line 403, in construct
    tapes, processing_fn = self.transform_fn(tape, *args, **kwargs)
  File "/.../pennylane/pennylane/gradients/parameter_shift_hessian.py", line 504, in param_shift_hessian
    return expval_hessian_param_shift(
  File "/.../pennylane/pennylane/gradients/parameter_shift_hessian.py", line 222, in expval_hessian_param_shift
    diag_recipes, partial_offdiag_recipes = _collect_recipes(
  File "/.../pennylane/pennylane/gradients/parameter_shift_hessian.py", line 103, in _collect_recipes
    diag_recipes.append(_get_operation_recipe(tape, i, diag_shifts, order=2))
  File "/.../pennylane/pennylane/gradients/parameter_shift.py", line 109, in _get_operation_recipe
    recipe = op.grad_recipe[p_idx]
AttributeError: 'Hamiltonian' object has no attribute 'grad_recipe'

I haven't investigated closely, but it looks like it is trying to take the Hamiltonian into account when collecting the gradient recipes. But here we do want to differentiate only with respect to gate parameters.

josh146 commented 2 years ago

@cvjjm this could be due to a recent change; due to integrating the differentiable HF solver into PennyLane proper, depending on how you generate the Hamiltonians, their coefficients might be trainable by default.

Do you have a minimal example of how you are creating the Hamiltonian?

soranjh commented 2 years ago

@cvjjm, Does setting H.coeffs.requires_grad = False solve the problem?

cvjjm commented 2 years ago

Here is a minimal example:

import pennylane as qml
import openfermion as of
from pennylane import numpy as np

dev = qml.device(
    'default.qubit',
    wires=2,
)
qubit_operator = of.QubitOperator('X0 Y1', 1.0)
hamiltonian = qml.import_operator(qubit_operator)
hamiltonian.coeffs.requires_grad = False

@qml.qnode(dev, diff_method='parameter-shift')
def circuit(params):
    qml.RY(params[0], wires=[0])
    qml.RY(params[1], wires=[1])
    #return [qml.expval(qml.PauliZ(wires=0)), qml.expval(qml.PauliZ(wires=1))] # works
    return qml.expval(hamiltonian) # demonstrate problem above
    #return qml.expval(qml.PauliZ(wires=0)) # fails with ValueError: all input arrays must have the same shape

hess = qml.gradients.param_shift_hessian(circuit)

params = np.array([0.2, 0.3])
print(circuit(params))
print(hess(params))

Setting requires_grad = False does not solve the issue.

While cooking up the example I found another problem. If you uncomment the third return statement in the example above I get a ValueError.

dwierichs commented 2 years ago

For the second bug I found the origin: As the second RY operation on wire 1 does not have any impact on the return value, its grad_method is set to '0' in the gradient_analysis during param_shift_hessian. This leads to it (correctly) being assigned no parameter shift recipes. However, the Boolean mask is not being updated, so that the function tries to execute these None shift recipes. This clearly is a bug and can be fixed by updating the Boolean mask during _collect_recipes. I'll have a PR up shortly.

dwierichs commented 2 years ago

For the first bug, the problem seems to stem from the behaviour of tape.trainable_params together with the Hamiltonian:

dev = qml.device('default.qubit', wires=2)
qubit_operator = of.QubitOperator('X0 Y1', 1.0)
hamiltonian = qml.import_operator(qubit_operator)
hamiltonian.coeffs.requires_grad = False

@qml.qnode(dev, diff_method='parameter-shift')
def circuit(param):
    qml.RY(param, wires=[0])
    return qml.expval(hamiltonian)

>>> param = np.array(0.2, requires_grad=True)
>>> circuit.construct((params,), {})
>>> print(circuit.qtape.trainable_params)
[0, 1]

As we can see, the Hamiltonian coefficient is set to be trainable.

soranjh commented 2 years ago

For the first bug, the problem seems to stem from the behaviour of tape.trainable_params together with the Hamiltonian:

dev = qml.device('default.qubit', wires=2)
qubit_operator = of.QubitOperator('X0 Y1', 1.0)
hamiltonian = qml.import_operator(qubit_operator)
hamiltonian.coeffs.requires_grad = False

@qml.qnode(dev, diff_method='parameter-shift')
def circuit(param):
    qml.RY(param, wires=[0])
    return qml.expval(hamiltonian)

>>> param = np.array(0.2, requires_grad=True)
>>> circuit.construct((params,), {})
>>> print(circuit.qtape.trainable_params)
[0, 1]

As we can see, the Hamiltonian coefficient is set to be trainable.

Setting requires_grad = False for the Hamiltonian coefficients seems to solve the issue:

dev = qml.device('default.qubit', wires=2)
qubit_operator = of.QubitOperator('X0 Y1', 1.0)
hamiltonian = qml.import_operator(qubit_operator)

c = np.array(hamiltonian.coeffs, requires_grad = False)
o = hamiltonian.ops
hamiltonian = qml.Hamiltonian(c, o)

@qml.qnode(dev, diff_method='parameter-shift')
def circuit(param):
    qml.RY(param, wires=[0])
    return qml.expval(hamiltonian)

params = np.array(0.2, requires_grad=True)
circuit.construct((params,), {})
print(circuit.qtape.trainable_params) # output: [0]

Not sure why hamiltonian.coeffs.requires_grad = True does not do the same thing.

josh146 commented 2 years ago

Not sure why hamiltonian.coeffs.requires_grad = True does not do the same thing.

I wonder if this is a Python subtlety due to the nesting 🤔

cvjjm commented 2 years ago

I think the problem is in PennyLane, namely here: https://github.com/PennyLaneAI/pennylane/blob/bcd837b1ed1187895c327abfe62aea71fbeba02f/pennylane/ops/qubit/hamiltonian.py#L211-L216 Because of this, modifying .coeffs after init does not change .data

josh146 commented 2 years ago

Ah, very good catch @cvjjm! It might make sense to modify the definition of the H.coeffs property: https://github.com/PennyLaneAI/pennylane/blob/bcd837b1ed1187895c327abfe62aea71fbeba02f/pennylane/ops/qubit/hamiltonian.py#L223

Rather than simply returning H._coeffs, it should build and return the coefficients by accessing H.data.

dwierichs commented 2 years ago

The second bug in the minimal example comment above is now fixed due to #2584 being merged.

cvjjm commented 2 years ago

I am leaving this open since Hamiltonian.coeffs still returns self._coeffs instead of accessing self.data

dwierichs commented 1 year ago

I believe all bugs in this issue have been resolved. @cvjjm , could you double check? Thanks! :)

cvjjm commented 1 year ago

.coeffs still returns a pre-computed ._coeffs value. Maybe it is worth to still implement the suggestion from https://github.com/PennyLaneAI/pennylane/issues/2195#issuecomment-1129469738 ?

dwierichs commented 1 year ago

As far as I can tell, this suggestion will not be implemented. While we could have done so in the meantime, there will be a move towards immutable objects anyways, which will remove the possible inconsistency between coeffs and data. In general, h.data will be the correct object to operate on, whereas h.coeffs, looking up h._coeffs, may produce unexpected "outdated" results.

cvjjm commented 1 year ago

Ok, thanks for the heads up!

PennyLaneAI / pennylane