Open dwierichs opened 3 years ago
@dwierichs, taking qml.GradientDescentOptimizer
as an example, we are looking at something like this?
params = np.random([2, 3], requires_grad=True)
opt = qml.GradientDescentOptimizer(cost_fn, lr=0.1, **hyperparameters)
with _ in range(num_steps):
params = opt.step(params, **cost_kwargs)
Yes, exactly. Accounting for the fact that the several steps should update the last parameters, change the last line to
params = opt.step(params, **cost_kwargs)
Note that if we give a num_steps
hyperparameter to GradientDescentOptimizer
we also could do
params = np.random([2, 3], requires_grad=True)
opt = qml.GradientDescentOptimizer(cost_fn, lr=0.1, num_steps=100, **hyperparameters)
new_params = opt.optimize(params, **cost_kwargs)
making a full optimization readily available without users having to write a loop.
making a full optimization readily available without users having to write a loop.
Nice! I would hesitate to make this the main UI though, since it adds additional complexity:
When the optimizer is potentially filling a remote queue or executing expensive hardware, I lean towards keeping the optimization loop explicit.
That makes sense! Despite all this being possible to implement, for example via kwargs callback
or convergence_check
, it probably would lead to unexpected cost explosion often enough to not do this :)
I always argued that the reason for the current UI is that other frameworks use a similar syntax, but upon closer inspection that is wrong - at least for PyTorch and Jax, which are all a bit different!
However, this would still be a serious breaking change that may affect about any user code out there. So I would say if a) we can implement it in a backwards compatible way and b) there is a serious advantage, we should do it.
If the only speed advantage is that the cost function and hyperparameters is not validated, I would almost suggest we just remove the validations :) (Or is this controversial?)
However, a contrary argument would be that one day the optimiser may be able to jit
or cache
the cost function by default?
I agree with all points, in particular the jit
ing or cache
ing might actually change the run time significantly. I really think this is conceptually a nice change and many things appear more convenient or "natural" to me personally, but this should definitely be at least for a few versions a backwards-compatible change. One way would be to check the input signature of the instantiation and decide whether the user is planning on using the current or "future" syntax, like so:
fun = lambda x: x**2
opt = qml.GradientDescentOptimizer(fun, lr=0.05)
# Leads to opt == qml.GradientDescentOptimizer(<lambda..>, 0.05, takes_fun=False)
opt.step(fun, param) # Raises an Error, based on takes_fun=False
opt = qml.GradientDescentOptimizer(0.05)
# opt == qml.GradientDescentOptimizer(<placeholder_lambda..>, 0.05, takes_fun=True)
# optionally raises a deprecation warning
opt.step(fun, param)
# opt == qml.GradientDescentOptimizer(<lambda..>, 0.05) as no function is known
opt.step(fun, param)
# Does not raise an Error, based on takes_fun=True.
Having a safe checking of the input signature certainly is challenging if we want to cover all cases.
An alternative way could be to simply allow for both usages and cache the function, actually adding validation of the function and skipping certain other steps like jit
ing if it's the known function.
Removing input validation alltogether might be a harsh step, I believe. For example in Rotosolve, the provided frequency numbers might be parsed wrongly, running the full algorithm in a potentially nonsensical way and without any notification.
It's not clear to me how the validation of the parameters would be reduced, because they are what is input at each step. A scenario I have in mind would be to also provide initial parameters to __init__
, giving the optimizer a "current position" self.param
and trying to get an optimized utility equal_shape(a, b)
that basically extends np.all_equal(shape(a),np.shape(b))
to any reasonable input arguments a
and b
. This memory of current parameters also would enable to call step
without having to pass around the parameters on the user side:
opt = qml.GradientDescentOptimizer(fun, param, lr=0.05)
for _ in range(num_steps):
opt.step()
new_param = opt.param
I'm a fan of initializing with the function and not passing it to opt.step
each time. For momentum-type optimizers, it does not make sense to be switching around the function being optimized. That always bugged me before.
We could implement this in a backward-compatible way by simply using
if callable(args[0]):
# passed a new function
# raise deprecation warning
else:
# use cached function
It doesn't seem like the optimizers have much validation, certainly not enough to consume much time, though we'd need profiling to say for sure. I think if we care about the efficiency of the optimizers, there's room for improvement in apply_grad
with its list comprehensions and array reshaping.
Yes, that seems like a nice solution to me! :) Maybe a profiling would be good to have indeed. I would need some more time until I get to this though.
Yes, apply_grad
might be faster if we perform the update in numpy instead, I guess?
Regarding the reshaping, this could be avoided if the optimizer internally stores the parameters as suggested above, but the flat version. Almost all optimizers work on them anyways and this way, they only would need to be unflattened when actually accessing opt.param
.
@dwierichs are you planning to implement this change? That would be awesome (and no rush!). Let us know (here or in the PR) if you have any more design questions to discuss.
Yes, I'd love to. This is partially up for discussion, still. And I get the impression there will be more questions coming up, thanks! :)
Great!
Feature details
The implementation of optimization methods via
class
es offers more advantages than are currently being used in PennyLane. The proposed refactor is to reduce thestep
andstep_and_cost
methods to their absolute minimum, as they are the part that is called repeatedly in optimizations. Other parts like preprocessing and validation of hyperparameters or the function to be optimized can be moved into the instantiation of the optimizer, if we move task-specific hyperparameters and the function input fromstep
to__init__
. The main restriction would be that an optimizer instance can not be reused for several cost functions. This is not too severe, however, as we can implement a method to allow for easy copying of an instance. The overhead in (rather rare) cases where this reusing for multiple cost functions is required seems to be much smaller than the overhead currently affecting all users by performing validation and similar steps at every iteration.This refactor moved optimizers closer to stateful entities, which seems like a good fit to me, in particular for optimizers that use data from previous steps as bias/surrogate data (think reusing the metric tensor in
QNGOptimizer
) or inherently have a memory (likeBFGS
-type optimizers).Implementation
Move validation steps and cost function input from
step/step_and_cost
to__init__
of optimizers.How important would you say this feature is?
1: Not important. Would be nice to have.
Additional information
Happy to work on this.