inconsistent energies or failures with charged cells

camelto2 commented 4 years ago

Describe the bug there seems to be a bug when defining charged cells for solids using develop. For a fundamental gap calculation or grand canonical ensemble, we have to add or remove electrons. However, I can getting inconsistent energies or outright failures. I am attaching a file with inputs/outputs of the various test cases.

The example is 8 atom Si, using restricted PBE orbitals and no jastrow, and use a fixed random seed. Therefore one would expect that whether or not I remove an electron from the up or down channel, I would get the same energy. This is not what I see, and depending on the "size" in the sposet builder, it can result in outright failure. There are 4 tests, with outputs/errors/scalar.dats labeled 1-4. This tests the various cases where up channel is the majority/minority spin channel, and in each case whether or not the sposet "size" is set to the size of the majority/minority spin channel.

I would have expected the correct input to be choosing the sposet size = majority spin channel, but these cases fail. If sposet "size" is equal to the minority channel size, it runs but has different energies.

test 1: spin_up is the minority spin channel with 15 electrons. spin_dn is the majority spin channel with 16 electrons. sposet size in builder is set to 15. out_test1 shows there is a BLAS memory issue, however this successfully runs and gives me a scalar.dat and an energy of -31.91(11). err_test1 is the error messages which are empty.

test 2: spin_up is the minority spin channel with 15 electrons. spin_dn is the majority spin channel with 16 electrons. sposet size in builder is 16, which is nominally what I want since the max number of orbtials needed is 16. out_test2 has similar BLAS memory issue, however this one fails and hangs indefinitely. Note the seg fault listed in err_test2. The code also hangs and never even writes a scalar.dat for this test.

test3: spin_up is the majority spin channel with 16 electrons. spin_dn is the minority spin channel with 15 electrons. sposet size in builder is 16, which is nominally what I want since the max number of orbitals needed is 16. out_test3 also has BLAS memory issues, and a less verbose segfault printed to err_test3. This case actually proceeds to write an empty .scalar.dat.

test4: spin_up is the majority spin channel with 16 electrons. spin_dn is the minority spin channel with 15 electrons. sposet size in builder is set to 15. This is similar to test1, where the size is set to the minority size. out_test4 also shows a BLAS memory issue. err_test4 is clean however. This case runs to completion and gives me an energy of -30.515(89), which is quite different than case 1.

In every case I see BLAS memory errors. Choosing the majority spin channel size as the size for the sposet builder results in failures and segfaults. The energy differences are large, but used small number of walkers and small steps/blocks just to see if it could run. I'm testing larger runs to see if they actually do give different energies when doing a production level run.

To Reproduce Steps to reproduce the behavior:

git commit 11fe1869b2787a3289bc453dae30f35bcd5b5907
spack install qmcpack@develop +complex
spack load qmcpack@develop +complex
run qmcpack

Expected behavior I would have expected the sposet_size to be set to the majority spin channel, but those fail. This also should not be giving blas errors on the cases where it does run.

System: spack info: -- linux-rhel7-broadwell / gcc@8.3.1 ---------------------------- qmcpack@develop\~afqmc+complex\~cuda\~da\~gui\~mixed+mpi+phdf5\~ppconvert+soa\~timers build_type=Release cuda_arch=none boost@1.73.0+atomic+chrono\~clanglibcpp\~container\~context\~coroutine+date_time\~debug+exception\~fiber+filesystem+graph\~icu+iostreams+locale+log+math\~mpi+multithreaded \~numpy\~pic+program_options\~python+random+regex+serialization+shared+signals\~singlethreaded+system\~taggedlayout+test+thread+timer\~versionedlayout+wave cxxstd=98 patches =246508e052c44b6f4e8c2542a71c06cacaa72cd1447ab8d2a542b987bc35ace9,4dd507e1f5a29e3b87b15321a4d8c74afdc8331433edabf7aeab89b3c405d556 visibility=hidden bzip2@1.0.8+shared zlib@1.2.11+optimize+pic+shared fftw@3.3.8+mpi\~openmp\~pfft_patches precision=double,float openmpi@4.0\~atomics\~cuda\~cxx\~cxx_exceptions+gpfs\~java\~legacylaunchers\~memchecker\~pmi+runpath\~sqlite3+static\~thread_multiple+vt fabrics=none schedulers=none hdf5@1.10.6\~cxx\~debug\~fortran\~hl+mpi+pic+shared\~\szip\~threadsafe api=none libxml2@2.9.10\~python libiconv@1.16 xz@5.2.5 openblas@0.3.9~consistent_fpcsr\~ilp64+pic+shared threads=none python@3.7.7+bz2+ctypes+dbm\~debug+libxml2+lzma\~nis\~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl\~tix\~tkinter\~ucs4\~uuid+zlib expat@2.2.9+libbsd libbsd@0.10.0 gdbm@1.18.1 readline@8.0 ncurses@6.2\~symlinks+termlib gettext@0.20.2+bzip2+curses+git\~libunistring+libxml2+tar+xz tar@1.32 libffi@3.3 openssl@1.1.1g+systemcerts sqlite@3.31.1+column_metadata+fts\~functions\~rtree

Additional context

Si8_cation.zip

prckent commented 4 years ago

Many thanks for including the reproducer and for developing the test cases.

For reference:

Is this example able to be run in the real build of the code or does it require complex?

Do you know if it crashes with just one task or thread. e.g. No MPI, 1 thread? I would assume yes.

camelto2 commented 4 years ago

this should be able to be run with real as it is the gamma point, but I'm testing some grand canonical stuff so I came across it using the complex code.

these runs were all done with 1 mpi task and multiple threads. I originally noticed the blas error message failure on multiple task/threads run. I'll check the specific case of 1 task and 1 thread to see if it persists.

prckent commented 4 years ago

There are certainly problems to fix here.

camelto2 commented 4 years ago

I have also confirmed the energy difference with a larger run, more walkers/steps/blocks etc. Since this is no jastrow and uses restricted orbitals, I would not expect a difference between up vs. down being chosen as the majority spin channel.

                        LocalEnergy               Variance           ratio

qmc series 11 -31.030589 +/- 0.000775 4.321024 +/- 0.016610 0.1393 qmc series 14 -31.123760 +/- 0.000979 4.215262 +/- 0.026166 0.1354

series 11 corresponds to a longer version of test 1 described above (i.e. up is minority), and series 14 corresponds to longer version of test 4 (dn is minority).

camelto2 commented 4 years ago

Do you know if it crashes with just one task or thread. e.g. No MPI, 1 thread? I would assume yes.

The BLAS: Bad memory unallocation is no longer present when using 1 MPI task, 1 thread, 1 walker per task

camelto2 commented 4 years ago

Ok, so there are a few more tests I've run, all which seem to indicate there is potentially an issue with the sposet builder.

First of all I think the BLAS: Bad memory unallocation error/warning is specific to the openblas used when I installed with spack. I have rerun with a separate install that did not use spack...I used intel19, and I no longer see these BLAS errors/warnings.

I have reran all the tests with the intel / non-spack install, and also run two additional tests, namely where I build the wavefunction differently. Instead of using a separate sposet_builder (which is the default from nexus writing my files), I instead use determinantset to build the orbitals and give it the reference orbitals as href. In this case, I think the determinantset builds all the orbitals possible in the h5 file, and then I can specify the number of occupied orbitals in each up and down determinant separately. Using that input, there are only two possibilities, up is majority spin or it is minority and I don't specify anything about the sposet size. The results are as follows:

determinantset_builder_up_minority -30.494360 +/- 0.009486 3.111982 +/- 0.048296 0.1021 determinantset_buidler_up_majority -30.498884 +/- 0.006297 3.109966 +/- 0.026973 0.1020

that is the expected behavior. Below are the results using sposet_builder and setting the size of the spos there.

sposet_builder_up=minority_sposize=minority -32.032349 +/- 0.022298 5.454654 +/- 0.075275 0.1703 sposet_builder_up=minority_sposize=majority -30.491509 +/- 0.006055 3.225182 +/- 0.050855 0.1058 sposet_buidler_up=majority_sposize=majority -30.499143 +/- 0.005784 3.177399 +/- 0.046943 0.1042 sposet_buidler_u=majority_sposize=minority -30.622095 +/- 0.006487 3.336455 +/- 0.077583 0.1090

clearly, sposet_builder_up=minority_sposize=minority and sposet_buidler_up=majority_sposize=minority do not agree in energy, and likely should result in an appabort. This is because I construct the SPOset with size N, but then have a determinant for the majority spin of size N+1 x N+1. So this should result in a failure, since you are requested a matrix size with an additional SPO that wasn't built in the builder. However, this however happily runs, and results in different energies depending on what you choose as the majority spin channel._

Also, sposet_builder_up=minority_sposize=majority and sposet_buidler_up=majority_sposize=majority seem to agree with the determinantset style input as shown above. This is good, and should be the expected behavior. This is because I choose the SPOset size = N, and have a determinant that is of size N-1 x N-1, which should be allowed. However, both of these inputs have a non-deterministic failure. Sometimes, I start that calculation and it segfaults before it can do anything. But if you then trying simply rerunning a few times with the same input, it all of a sudden makes it through the VMC and segfaults at the end. The numbers shown above are from rerunning a few times until it actually ran the VMC and segfaulted after.

Something is clearly wrong with the sposet builder, and it seems that the safest way (at least for variable size up and down determinants using the same underlying orbitals) is to use determinantset to build your orbitals instead of separately using sposet_builder

ye-luo commented 4 years ago

Clearly red flag. will take a look soon.

ye-luo commented 3 years ago

For the spine SPO, constructing an SPOSet with N orbital and using it for evaluating M(!=N) orbitals is mostly not supported. for the polarized case, please build two sposets with distinct names spo_u and spo_d and use them accordingly in Slater determinants. If you can still find the h5 file, I can play with it to add a few safe guards.

QMCPACK / qmcpack

inconsistent energies or failures with charged cells #2530