Large auto-correlation time (1900) after improving statistics

LeoGaspard commented 1 year ago

Dear TRIQS developpers,

I am trying to use CT-HYB (version 3.1) to do DMFT calculation on Ba2IrO4 including spin-orbit coupling.

Due to the sign problem (~0.6), the statistics on G(tau) is not so good :

After some iterations I decided to change the number of MC steps between each measurements and the number of measurements :

Iteration	length_cycle	n_cycles (millions)
1-17	400	18
18-20	800	18
21-22	800	180

There seemed to be less noise in the measured G(tau) after the last iteration

But the problem is that the auto-correlation time given in the solver report became very large after iteration n°19

Iteration	Auto-correlation time
1-19	Between 0.08 and 0.37
20	89.26
21	1845.02
22	1999.51

In all the iteration reports, the acceptance rates for the moves are comparable, the auto-correlation time is the only thing that significantly changed.

Such a jump in this value seems strange and I was wondering the reason it happened as something clearly went wrong at some point in the calculation.

The script and input files can be found in this gist

I would be grateful for any help to understand this issue. Best regards, Léo Gaspard

the-hampel commented 1 year ago

Hi @LeoGaspard ,

this is indeed weird. Can you plot the point-wise difference between iteration 20 and 22 G(tau)? I assume that there is almost no difference? From what you report it seems that the auto-corr time exploded only the last two iterations when you increased n_cycles.

Since you are solving a problem including SOC, I assume you are using the complex build of cthyb? Or are you working with a real Hamiltonian transformation somehow? This information would help to narrow the cause of the problem here.

P.S. : On the unstable branch we also added a comprehensive convergence tutorial: https://triqs.github.io/cthyb/unstable/guide/cthyb_convergence_tests.html in case you want to check if your warmup phase was long enough etc.

Best, Alex

LeoGaspard commented 1 year ago

Hi @the-hampel Here is the difference between G(tau) at iterations 20 and 22

I made a mistake in the table with the length_cycle n_cycle values, here is the correct one :

Iteration	length_cycle	n_cycle (millions)
1-15	400	18
16-20	800	18
21-22	800	180

What seems strange is that the auto-correlation time exploded at iteration 20 where no parameter changed, it was the same from iteration 16 to 20.

I am indeed using the complex build of CTHYB, (hybridisation_is_complex and local_hamiltonian_is_complex are True).

Thank you for the tutorial, I think that I might be using cycles that are too long I will check on that. But I would be surprised if it was the reason of this sudden increase in auto-correlation time.

Best, Léo

the-hampel commented 1 year ago

Thanks for the information. For now I have three ideas how to get behind the reason for this: 1) check that the warmup was long enough (see tutorial). Otherwise this could mess up the prediction of the correlation time. And when you finally increase n_cycle to a large value you get for the first time an accurate prediction. 2) I am not sure if the autocorr time calculation has been carefully tested when the sign becomes complex, for the complex cthyb. I remember that right now either the sign or the perubation order is used for calculating the autocorr time (whatever predicts the larger time). Since the sign can be complex it could somehow maybe break the estimation? @Wentzell maybe you have an idea? 3) I see from your first plot that the impurity is almost full, or to be more specific the orbital you are plotting. I could imagine that once the error goes down by increasing n_cycles, the impurity gets closer to be completely filled. Once filled any prediction of corrtime could be problematic, because moves will be less likely. However, you mentioned that the acceptance rate of moves does not go down. So I am not sure about this.

Best, Alex

Wentzell commented 1 year ago

The behavior described is indeed strange and should be understood.

It's correct that the auto-correlation time is currently estimated only based on the perturbation order and the sign of the configuration, but we do not have much experience for complex examples.

The jump from iteration 19 to 20 is indeed very strange and should be understood first. Do you observe anything else changing between these iterations? How different is the solver input for both these iterations?

I also wonder if the large estimates you see for iterations 21-22 are at all tied to the larger number of cycles or would show up even for n_cycle=18M.

LeoGaspard commented 1 year ago

Here's the report at the end of iterations 19 and 20, except from the auto-correlation time, nothing seems to have dramatically changed. As it is in the same iteration batch (15-20), the input parameters (n_cycle, length_cycle, n_warmup..) are all the same.

Iteration	19	20
Move set Insert two operators	0.0474778	0.0477236
Move set Remove two operators	0.0474819	0.0477164
Move set Insert four operators	0.00574264	0.00575782
Move set Remove four operators	0.00574016	0.00576084
Move Shift one operator	0.205205	0.209008
Average sign	0.609396	0.596431
Average order	8.09799	8.40913
Max asym	0.175557	0.173324

The difference in $G_0(i\omega_n) $ is also quite small between both these iterations (G is a blockGF, I plot only for the first 3x3 block)

I did run again iterations 21 and 22, starting from 20 but with n_cycle=18M instead of 180M. This time I have an auto-correlation time of 0.478 and 0.116. The times of ~1900 at the iterations 21 and 22 then seems tied to the large number of cycle. But the time of 89 for iteration 20 is not explained by that is the calculation parameters are the same as for the previous 5 iterations.

I understand from the tutorials that my auto-correlation time is too small for the previous iterations, I will be looking into that, but it doesn't seem to explain the jump at iteration 20

the-hampel commented 1 year ago

Hi @LeoGaspard, sorry for the long silence. I think right now we do not have a good idea what goes wrong in the estimation of the auto correlation time here. In principle the the ratio of acceptance rate and average order should give you an estimate. Is this a typo:

Move set Insert two operators | 0.0474778 | 0.00477236

There is an additional 0 in iteration 20 acceptance rate for insert two operators. If this is happening, then the increase in autocorr time makes sense to me. If not there is something special in the complex case. Is this drop in acceptance rate consistently happening?

LeoGaspard commented 1 year ago

Hi @the-hampel , Thank you for your reply, there was indeed a typo in my previous comment, I edited it. Both acceptance rates are indeed very close. The rates do not change so much after the 10th iteration :

On my side, I checked the convergence of the parameters as you suggested, I found that the jump in auto-correlation time indeed happens for too large length_cycle when I use spin-orbit coupling :

When I do not use spin-orbit coupling, a jump happens but it is to a very low value of auto-correlation time :

the-hampel commented 1 year ago

Thanks for clarification and the detailed plots. So it seems that the determination of the autocorr time generally works with complex hamiltonians / hybridizations (the curves for smaller cycles look as expected).

Since the autocorr time is either calculated from the sign or from the avg order maybe what happens is that it jumps from one estimator to the other. Whatever gives a larger value will be set as autocorr time. Can you plot for the number of cycles on the x-axis also the value of the sign (real and complex value separately) and the average order? Both are printed in the std output when the solver is finished. Maybe one of the two suddenly collapses or changes?

LeoGaspard commented 1 year ago

Here is the plot for the average sign, there is a sudden change in the imaginary part at iteration 21, at iteration 20 there is a change but it also happens at iteration 17, without the change in auto-correlation time.

The average perturbation order seems even more stable than the average sign :

the-hampel commented 1 year ago

Okay let me discuss this with @Wentzell . Maybe it is related to the imaginary sign change. I am not sure this is correctly handled in the complex estimation of cthyb.

Putting aside all these discussions about the autocorr time your analysis of all these quantities indicates that there seems to be no problem in your measured G(tau), since average order, sign, acceptance rates look all stable. So it is safe to use the result as is. However, we will try to understand and fix the estimation of the autocorr time in the complex build. At least for me there is no reason to believe that the QMC itself is not working as expected.

TRIQS / cthyb

Large auto-correlation time (1900) after improving statistics #155