Optimizers improvement: Move preprocessing to initialization

dwierichs commented 3 years ago

Feature details

The implementation of optimization methods via classes offers more advantages than are currently being used in PennyLane. The proposed refactor is to reduce the step and step_and_cost methods to their absolute minimum, as they are the part that is called repeatedly in optimizations. Other parts like preprocessing and validation of hyperparameters or the function to be optimized can be moved into the instantiation of the optimizer, if we move task-specific hyperparameters and the function input from step to __init__. The main restriction would be that an optimizer instance can not be reused for several cost functions. This is not too severe, however, as we can implement a method to allow for easy copying of an instance. The overhead in (rather rare) cases where this reusing for multiple cost functions is required seems to be much smaller than the overhead currently affecting all users by performing validation and similar steps at every iteration.

This refactor moved optimizers closer to stateful entities, which seems like a good fit to me, in particular for optimizers that use data from previous steps as bias/surrogate data (think reusing the metric tensor in QNGOptimizer) or inherently have a memory (like BFGS-type optimizers).

Implementation

Move validation steps and cost function input from step/step_and_cost to __init__ of optimizers.

How important would you say this feature is?

1: Not important. Would be nice to have.

Additional information

Happy to work on this.

josh146 commented 3 years ago

@dwierichs, taking qml.GradientDescentOptimizer as an example, we are looking at something like this?

params = np.random([2, 3], requires_grad=True)
opt = qml.GradientDescentOptimizer(cost_fn, lr=0.1, **hyperparameters)

with _ in range(num_steps):
    params = opt.step(params, **cost_kwargs)

dwierichs commented 3 years ago

Yes, exactly. Accounting for the fact that the several steps should update the last parameters, change the last line to

    params = opt.step(params, **cost_kwargs)

Note that if we give a num_steps hyperparameter to GradientDescentOptimizer we also could do

params = np.random([2, 3], requires_grad=True)
opt = qml.GradientDescentOptimizer(cost_fn, lr=0.1, num_steps=100, **hyperparameters)
new_params = opt.optimize(params, **cost_kwargs)

making a full optimization readily available without users having to write a loop.

josh146 commented 3 years ago

making a full optimization readily available without users having to write a loop.

Nice! I would hesitate to make this the main UI though, since it adds additional complexity:

total number of quantum evaluations (and the resulting cost price) is less controllable
we would require an API for specifying a 'stopping' or convergence condition (or, alternatively, determining if convergence is not being achieved).

When the optimizer is potentially filling a remote queue or executing expensive hardware, I lean towards keeping the optimization loop explicit.

dwierichs commented 3 years ago

That makes sense! Despite all this being possible to implement, for example via kwargs callback or convergence_check, it probably would lead to unexpected cost explosion often enough to not do this :)

mariaschuld commented 3 years ago

I always argued that the reason for the current UI is that other frameworks use a similar syntax, but upon closer inspection that is wrong - at least for PyTorch and Jax, which are all a bit different!

However, this would still be a serious breaking change that may affect about any user code out there. So I would say if a) we can implement it in a backwards compatible way and b) there is a serious advantage, we should do it.

If the only speed advantage is that the cost function and hyperparameters is not validated, I would almost suggest we just remove the validations :) (Or is this controversial?)

However, a contrary argument would be that one day the optimiser may be able to jit or cache the cost function by default?

dwierichs commented 3 years ago

I agree with all points, in particular the jiting or cacheing might actually change the run time significantly. I really think this is conceptually a nice change and many things appear more convenient or "natural" to me personally, but this should definitely be at least for a few versions a backwards-compatible change. One way would be to check the input signature of the instantiation and decide whether the user is planning on using the current or "future" syntax, like so:

fun = lambda x: x**2
opt = qml.GradientDescentOptimizer(fun, lr=0.05) 
# Leads to opt == qml.GradientDescentOptimizer(<lambda..>, 0.05, takes_fun=False)
opt.step(fun, param) # Raises an Error, based on takes_fun=False
opt = qml.GradientDescentOptimizer(0.05)  
# opt == qml.GradientDescentOptimizer(<placeholder_lambda..>, 0.05, takes_fun=True)
# optionally raises a deprecation warning
opt.step(fun, param) 
# opt == qml.GradientDescentOptimizer(<lambda..>, 0.05) as no function is known
opt.step(fun, param) 
# Does not raise an Error, based on takes_fun=True.

Having a safe checking of the input signature certainly is challenging if we want to cover all cases.

An alternative way could be to simply allow for both usages and cache the function, actually adding validation of the function and skipping certain other steps like jiting if it's the known function.

Removing input validation alltogether might be a harsh step, I believe. For example in Rotosolve, the provided frequency numbers might be parsed wrongly, running the full algorithm in a potentially nonsensical way and without any notification. It's not clear to me how the validation of the parameters would be reduced, because they are what is input at each step. A scenario I have in mind would be to also provide initial parameters to __init__, giving the optimizer a "current position" self.param and trying to get an optimized utility equal_shape(a, b) that basically extends np.all_equal(shape(a),np.shape(b)) to any reasonable input arguments a and b. This memory of current parameters also would enable to call step without having to pass around the parameters on the user side:

opt = qml.GradientDescentOptimizer(fun, param, lr=0.05)
for _ in range(num_steps):
    opt.step()
new_param = opt.param

albi3ro commented 3 years ago

I'm a fan of initializing with the function and not passing it to opt.step each time. For momentum-type optimizers, it does not make sense to be switching around the function being optimized. That always bugged me before.

We could implement this in a backward-compatible way by simply using

if callable(args[0]):
    # passed a new function
    # raise deprecation warning
else:
    # use cached function

It doesn't seem like the optimizers have much validation, certainly not enough to consume much time, though we'd need profiling to say for sure. I think if we care about the efficiency of the optimizers, there's room for improvement in apply_grad with its list comprehensions and array reshaping.

dwierichs commented 3 years ago

Yes, that seems like a nice solution to me! :) Maybe a profiling would be good to have indeed. I would need some more time until I get to this though.

Yes, apply_grad might be faster if we perform the update in numpy instead, I guess?

Regarding the reshaping, this could be avoided if the optimizer internally stores the parameters as suggested above, but the flat version. Almost all optimizers work on them anyways and this way, they only would need to be unflattened when actually accessing opt.param.

mariaschuld commented 3 years ago

@dwierichs are you planning to implement this change? That would be awesome (and no rush!). Let us know (here or in the PR) if you have any more design questions to discuss.

dwierichs commented 3 years ago

Yes, I'd love to. This is partially up for discussion, still. And I get the impression there will be more questions coming up, thanks! :)

mariaschuld commented 3 years ago

Great!

PennyLaneAI / pennylane