VQE optimization history has unexpected plateaus

jiayu-shen commented 2 years ago

Environment

Qiskit version: 0.37.2
Qiskit Terra version: 0.21.2
Qiskit Aer version: 0.10.4
Python version: 3.9.0
Operating system: macOS Big Sur 11.6.4

What is happening?

When using the statevector simulator, the ADAM optimizer, and a callback function in VQE, the energy stored by the callback function displays unexpected plateaus. In an example below, every 9 adjacent iterations have very similar energies.

How can we reproduce the issue?

The code is modified from https://qiskit.org/documentation/tutorials/algorithms/04_vqe_advanced.html

from qiskit import Aer
from qiskit.opflow import X, Z, I
from qiskit.utils import QuantumInstance, algorithm_globals
from qiskit.algorithms import VQE
from qiskit.algorithms.optimizers import ADAM
from qiskit.circuit.library import TwoLocal
import matplotlib.pyplot as plt

H2_op = (-1.052373245772859 * I ^ I) + \
        (0.39793742484318045 * I ^ Z) + \
        (-0.39793742484318045 * Z ^ I) + \
        (-0.01128010425623538 * Z ^ Z) + \
        (0.18093119978423156 * X ^ X)

seed = 50
algorithm_globals.random_seed = seed
qi = QuantumInstance(Aer.get_backend('statevector_simulator'), seed_transpiler=seed, seed_simulator=seed)

ansatz = TwoLocal(rotation_blocks='ry', entanglement_blocks='cz')
optimizer = ADAM(maxiter=10)

intermediate_info = {
    'nfev': [],
    'parameters': [],
    'energy': [],
    'stddev': []
}

def callback(nfev, parameters, energy, stddev):
    intermediate_info['nfev'].append(nfev)
    intermediate_info['parameters'].append(parameters)
    intermediate_info['energy'].append(energy)
    intermediate_info['stddev'].append(stddev)

vqe = VQE(ansatz, optimizer=optimizer, quantum_instance=qi, callback=callback)
result = vqe.compute_minimum_eigenvalue(operator=H2_op)

Plotting intermediate_info['energy'] versus intermediate_info['nfev'] gives Every 9 adjacent iterations have very similar energies, which seems unexpected with the Adam optimizer. When the number of qubits of the Hamiltonian changes, the length of each plateau can also change (from 9 in the current example).

What should happen?

The expected optimization history of energy with an Adam optimizer should be smoother, without artificial plateaus.

Any suggestions?

I guess that some repeated evaluations are done in optimization, or the callback function is not working properly.

jiayu-shen commented 2 years ago

The CG optimizer also yields a similar issue

The GradientDescent optimizer does not yield plateaus but also yields some repeated pattern

woodsp-ibm commented 2 years ago

If you look at the plots in this tutorial https://qiskit.org/documentation/tutorials/algorithms/02_vqe_convergence.html you will see similar plateaus. These are where optimizers using gradients are computing say a finite diff gradient using the same objective function - the callback is from the objective function and cannot discriminate how the optimizer is using it. Since the delta around the point is so small when the gradient is computed at the scale of the plots they end up with the staircase like look. (The tutorial has a paragraph in it mentioning gradients and the staircase effect)

Adam uses a finite diff gradient by default. TwoLocal defaults to reps=3 and with 2 qubits that 2 parameters per layer (rep) plus 2 more in a final block so 8 params in total. So I believe the 9 is 1 + 8 i.e one eval at the current point and 8 around it to compute the gradient to go to the next point (the finite diff like scipy only does a small epsilon in one direction in each dimensions)

And the number of points will vary by size of the hamiltonian, number of qubits given the same ansatz structure. Hopefully you can see from my detailing where the 9 comes from in the above.

woodsp-ibm commented 2 years ago

As you gave a thumbs to my prior comment I will assume the question/matter is solved, so I am closing this.

I will finally however note that if you use a gradient with VQE then the gradient computation does not go through its objective function and the callback will reflect each iteration as you expected.

jiayu-shen commented 2 years ago

Thank you so much! Your explanation makes sense to me. Yes, the question was solved. I further checked the source code of ADAM and VQE. Without using the gradient argument in VQE, with the ADAM optimizer, the first evaluation is at the center, then consecutively at n_params points in all directions of the parameter space by eps. Each step's evaluation is passed into the VQE callback function. The last iteration in the plot is only at the center, since no further update is needed. That is why there is only one point in the last "step" of the staircase.

A fairly similar process happens with SPSA and QNSPSA.

Thank you for your note on gradient, and since you mentioned, I was also trying the VQE gradient argument explicitly with different types of optimizers. Sometimes the callback does not work as I have expected. Sometimes only the final result is passed into the callback. Just like the page you attach https://qiskit.org/documentation/tutorials/algorithms/02_vqe_convergence.html, the last plot "Energy Convergence using Gradient" contains only one point, I think (It is hard to eyeball) consistent with my tests.

Sometimes, when changing the type of the optimizer, and the grad_method in Gradient, the callback might give all the evaluations.

Do I need to submit a separate issue regarding the gradient?

woodsp-ibm commented 2 years ago

That last plot used to look like a curve - is there really a point in it. I noticed that it seemed empty and just assumed it did not plot for some reason. Hopefully it ran more than one point and stopped, then plotted that. The optimizer itself always works with the objective function for each iteration. There is also a callback on the scipy optimzers - you could also see what is happening using that. Its hard to imagine if the objective function is called that a callback is not done. If you look further and there does seem to be a problem then please feel free to raise another issue. A different one would be better more targetted to whatever the problem seems to be. I will note that algorithms like VQE are currently being rewritten to use the (runtime) primitives and there are now gradients under algorithms folder here in main that are new and will be used instead on the older ones as these too work with primitives. So things are going to change in this area anyway - well they are changing already and the plan is this is all in the next upcoming release (the existing code will be marked pending deprecation, then be deprecated in a subsequent release and finally removed leaving just the new primitive based algos.)

jiayu-shen commented 2 years ago

Thank you. There is actually a single point on that last plot. The plot does not set the marker, so it may not display. The center of the energy axis of the "empty" plot is at that final energy value.

I will raise another issue with more details.

Qiskit / qiskit