Add adaptive step size for finite-difference gradients on noisy devices

PennyLaneAI / pennylane

PennyLane is a cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Train a quantum computer the same way as a neural network.

https://pennylane.ai

Apache License 2.0

2.25k stars 584 forks source link

Add adaptive step size for finite-difference gradients on noisy devices #904

Open glassnotes opened 3 years ago

glassnotes commented 3 years ago

As discussed in #894, the default step size of 1e-7 in the finite-difference method produces extremely large values for the gradient when run on a noisy device in tape mode.

A simple solution is to set a larger default value, as is currently done in non-tape mode.

An alternative solution is to implement a step size that changes adaptively based on the number of shots. The relationship between step size and mean-squared error of the resulting gradient is explored in 2008.06517. Figure 5 in particular shows provides a plot of optimal size vs. shots, so it should be straightforward to fit that line and use it as an expression for step size.

josh146 commented 3 years ago

Thanks @glassnotes! We should also add it to the documentation somewhere, so that it doesn't come as a surprise to users. The best place would be a future quickstart page regarding differentiation methods, which is on our todo list to add.

Figure 5 in particular shows provides a plot of optimal size vs. shots, so it should be straightforward to fit that line and use it as an expression for step size.

I love this idea. I'm a bit worried though, do we know if figure 5 is generalizable to any circuit? It might have dependence on the specific structure of the circuit that was simulated.

Another thing to keep in mind are CV circuits; I don't believe anyone has performed a similar fit. So it's a bit up in the air what the best value should be for CV circuits --- we could leave it at 1e-7 for now?

trbromley commented 3 years ago

To add, equation 38 is the relavant equation: To get this exactly, we need to know sigma_0 (single shot variance, requires some preprocessing), and the third derivative f3. We also need to be careful because the optimal h is a compromise between multiple parameters.

However we could definitely look at a heuristic for scaling as N^-1/6.

co9olguy commented 3 years ago

Definitely allowing the user to more easily know about the default, and modify it as needed, are good things to have. I honestly don't think there is a clear "best" answer that we should force upon people without the ability to easily change