MRChemSoft / mrchem

MultiResolution Chemistry
GNU Lesser General Public License v3.0
27 stars 21 forks source link

Failing response SCF test #487

Open stigrj opened 3 months ago

stigrj commented 3 months ago

Random segfault in Projecting occupied space in response SCF test

stigrj commented 2 months ago

@gitpeterwind I see you have written this comment in MRCPP/src/utils/ComplexFunction.cpp :

//For some unknown reason the h2_mag_lda test sometimes fails when schedule(dynamic) is chosen

I have traced a failing test li_pol_lda to the same OMP loop, even with schedule(static), so there seems to be some issue here.

stigrj commented 2 months ago

The issue appears when compiling with pure OMP, no MPI, and is randomly triggered when running the test case as is, but seems to always trigger if I change world_prec from 1e-3 to 1e-2.

gitpeterwind commented 2 months ago

There seems to be an obvious bug: the line 1804: Sreal(orbVecBra[i], orbVecKet[j]) += S_temp(i, j); should be replaced in the same way as is done for the real case (20 lines higher):

                        // must ensure that threads are not competing
                        double &Srealij = Sreal(orbVecBra[i], orbVecKet[j]);
                        double &Stempij = S_temp(i, j);
#pragma omp atomic
                        Srealij += Stempij;

If you have a setup where you can test this easily, that would be nice (even push a fix?)

stigrj commented 2 months ago

Yes, I will test it later 🙂

stigrj commented 2 months ago

Hmm, but I'm now running in the if (serial) branch where the omp atomic is already added :thinking: But for some reason I'm not able to trigger the error anymore, even without changing the code... Need to look closer tomorrow.

I remember having some issues with the += operator in Eigen before, though.

gitpeterwind commented 2 months ago

Sorry, I read wrongly the if-else block. So my previous comment was meaningless. It is difficult with these non-reproducible bugs. But I cannot see another unsafe part than that +=. We can put that in a safer way, but I cannot do that now.