QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
283 stars 135 forks source link

Nexus: support supercell twists in PySCF workflows #5073

Open jtkrogel opened 5 days ago

jtkrogel commented 5 days ago

Proposed changes

This PR adds support for arbitrary supercell twists/twist grids in workflows involving PySCF and QMCPACK.

This PR is now ready for review.

What type(s) of changes does this code introduce?

Does this introduce a breaking change?

What systems has this change been tested on?

Laptop, Improv at ALCF

Checklist

Path out of WIP

jtkrogel commented 4 days ago

Ready to proceed.

ye-luo commented 4 days ago

could you suggest a reviewer?

jtkrogel commented 4 days ago

Yes, already flagged @anbenali for this.

ye-luo commented 4 days ago

Yes, already flagged @anbenali for this.

Oh thanks. I missed it.

prckent commented 3 days ago

c4q does not complete successfully for me in both example cases. Additionally, Nexus doesn't notice the error and keeps checking status, this time overnight. (I have seen this on other occasions, so my guess is that there is some changed handling or error messages or signals that are not being caught by the workstation/"wsNN" infrastructure, perhaps with openmpi runs). These were run on nitrogen2 with the nightly test configuration for gcc "new"+openmpi. i.e. reasonably new versions of all software including python (3.11.9) installed via spack. Note the broken scf.h5 link. PySCF is 2.5.0 in this case. Happy to poke further -- this could well be completely unrelated to Nexus and something to do with the converter or a PySCF version dependency etc.

nohup: ignoring input

_____________________________________________________

                     Nexus 2.1.0

        (c) Copyright 2012-  Nexus developers

                     Please cite:
  J. T. Krogel Comput. Phys. Commun. 198 154 (2016)
     https://doi.org/10.1016/j.cpc.2015.08.012
_____________________________________________________

Checking for Nexus dependencies on the current machine...

  Nexus dependencies available on current machine:
    python3      = 3.11.9      (required)
    numpy        = 1.26.4      (required)
    scipy        = 1.13.1      (optional)
    h5py         = 3.11.0      (optional)
    matplotlib   = (unknown)   (optional)
    pydot        = 1.4.2       (optional)
    spglib       = 2.0.2       (optional)
    seekpath     = 2.0.1       (optional)
    pycifrw      = (unknown)   (optional)

  Nexus dependencies recommended for full functionality:
    python3      = 3.6.0      (required)
    numpy        = 1.13.1     (required)
    scipy        = 0.19.1     (optional)
    h5py         = 2.7.1      (optional)
    matplotlib   = 2.0.2      (optional)
    pydot        = 1.2.3      (optional)
    spglib       = 1.9.9      (optional)
    seekpath     = 1.4.0      (optional)
    pycifrw      = 4.3.0      (optional)
    cif2cell     = 1.2.10     (optional)

  All required Nexus dependencies are met.
    Core workflow features should work.
    Some optional features may not.
    See below for more information.

  Some optional dependencies are missing or merit an update.
    These modules are not needed for core workflow operation.
    Optional features related to outdated modules may still work.
    Please install updated versions if problems are encountered.

  Optional dependencies that are missing:
    cif2cell   is missing.  Install 1.2.10 or greater.

  Optional dependencies benefitting from user check or update:
    matplotlib version is unknown.  Check for 2.0.2 or greater.
    pycifrw    version is unknown.  Check for 4.3.0 or greater.

Applying user settings 

  Pseudopotentials
    reading pp:  ../../pseudopotentials/C.BFD.upf 
    reading pp:  ../../pseudopotentials/C.BFD.xml 
    reading pp:  ../../pseudopotentials/H.BFD.upf 
    reading pp:  ../../pseudopotentials/H.BFD.xml 
    reading pp:  ../../pseudopotentials/O.BFD.upf 
    reading pp:  ../../pseudopotentials/O.BFD.xml 

Project starting 
  checking for file collisions 
  loading cascade images 
    cascade 0 checking in 
  checking cascade dependencies 
    all simulation dependencies satisfied 

  starting runs:
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
  elapsed time 0.0 s  memory 117.18 MB 
    Entering ./runs/diamond_ta/scf 0 
      writing input files  0 scf
   Entering ./runs/diamond_ta/scf 0 
      sending required files  0 scf 
      submitting job  0 scf 
    Entering ./runs/diamond_ta/scf 0 
      Executing:  
        export OMP_NUM_THREADS=16
        python3 scf.py 

  elapsed time 3.0 s  memory 319.44 MB 
  elapsed time 6.1 s  memory 1451.97 MB 
  elapsed time 9.1 s  memory 1606.60 MB 
  elapsed time 12.1 s  memory 2024.05 MB 
  elapsed time 15.1 s  memory 1619.60 MB 
  elapsed time 18.1 s  memory 1745.74 MB 
  elapsed time 21.1 s  memory 1402.56 MB 
  elapsed time 24.2 s  memory 1141.36 MB 
  elapsed time 27.2 s  memory 2076.82 MB 
  elapsed time 30.2 s  memory 1839.52 MB 
  elapsed time 33.2 s  memory 1645.86 MB 
  elapsed time 36.2 s  memory 1993.04 MB 
(many lines deleted)
  elapsed time 1223.2 s  memory 117.18 MB 
    Entering ./runs/diamond_ta/scf 0 
      copying results  0 scf 
    Entering ./runs/diamond_ta/scf 0 
      analyzing  0 scf 

  elapsed time 1226.3 s  memory 117.18 MB 
    Entering ./runs/diamond_ta/scf 1 
      writing input files  1 c4q 
    Entering ./runs/diamond_ta/scf 1 
      sending required files  1 c4q 
      submitting job  1 c4q 
    Entering ./runs/diamond_ta/scf 1 
      Executing:  
        export OMP_NUM_THREADS=1
        mpirun -np 1 convert4qmc -prefix c4q -orbitals scf.h5 

  elapsed time 1229.3 s  memory 117.18 MB 
    Entering ./runs/diamond_ta/scf 1 
      copying results  1 c4q 
    Entering ./runs/diamond_ta/scf 1 
      analyzing  1 c4q 

  elapsed time 1232.3 s  memory 117.18 MB 
  elapsed time 1235.4 s  memory 117.18 MB 
  elapsed time 1238.4 s  memory 117.18 MB 
  elapsed time 1241.4 s  memory 117.18 MB 
  elapsed time 1244.4 s  memory 117.18 MB 
  elapsed time 1247.4 s  memory 117.18 MB 
  elapsed time 1250.4 s  memory 117.18 MB 
 (many lines deleted)
  elapsed time 60547.2 s  memory 117.18 MB 
  elapsed time 60550.2 s  memory 117.18 MB 
  elapsed time 60553.2 s  memory 117.18 MB 
  elapsed time 60556.2 s  memory 117.18 MB 
  elapsed time 60559.3 s  memory 117.18 MB 
$ pwd; ls -l
.. /qmcpack/nexus/examples/qmcpack/rsqmc_pyscf/02_diamond_hf_qmc/runs/diamond_ta/scf
total 240
-rw-r--r-- 1 pk7 users   1021 Jul  2 17:26 c4q.err
-rw-r--r-- 1 pk7 users     40 Jul  2 17:26 c4q.in
lrwxrwxrwx 1 pk7 users      6 Jul  2 17:26 c4q.orbs.h5 -> scf.h5
-rw-r--r-- 1 pk7 users     79 Jul  2 17:26 c4q.out
-rw-r--r-- 1 pk7 users   1252 Jul  2 17:25 scf.err
-rw-r--r-- 1 pk7 users 139630 Jul  2 17:25 scf.out
-rw-r--r-- 1 pk7 users   1885 Jul  2 17:05 scf.py
-rw-r--r-- 1 pk7 users    360 Jul  2 17:05 scf.struct.xsf
-rw-r--r-- 1 pk7 users    175 Jul  2 17:05 scf.struct.xyz
-rw-r--r-- 1 pk7 users  69792 Jul  2 17:25 scf.twistnum_000.h5
drwxr-xr-x 2 pk7 users     52 Jul  2 17:26 sim_c4q
drwxr-xr-x 2 pk7 users     52 Jul  2 17:26 sim_scf

c4q.err

Could not open H5 file
[nitrogen2:3440447] *** Process received signal ***
[nitrogen2:3440447] Signal: Aborted (6)
[nitrogen2:3440447] Signal code:  (-6)
[nitrogen2:3440447] [ 0] /lib64/libc.so.6(+0x3e6f0)[0x7f162b23e6f0]
[nitrogen2:3440447] [ 1] /lib64/libc.so.6(+0x8b94c)[0x7f162b28b94c]
[nitrogen2:3440447] [ 2] /lib64/libc.so.6(raise+0x16)[0x7f162b23e646]
[nitrogen2:3440447] [ 3] /lib64/libc.so.6(abort+0xd3)[0x7f162b2287f3]
[nitrogen2:3440447] [ 4] convert4qmc[0x47a2f3]
[nitrogen2:3440447] [ 5] convert4qmc[0x41ab59]
[nitrogen2:3440447] [ 6] /lib64/libc.so.6(+0x29590)[0x7f162b229590]
[nitrogen2:3440447] [ 7] /lib64/libc.so.6(__libc_start_main+0x80)[0x7f162b229640]
[nitrogen2:3440447] [ 8] convert4qmc[0x422db5]
[nitrogen2:3440447] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 3440447 on node nitrogen2 exited on
signal 6 (Aborted).
--------------------------------------------------------------------------
jtkrogel commented 1 day ago

For the nexus test failure, please add two print statements after this line to investigate:

1452:   File "/home/pk7/projects/qmc/git_QMCPACK_prckent/qmcpack/nexus/tests/unit/test_pyscf_input.py", line 577, in test_write
1452:     assert(text_eq(text,ref_text))

# add these
print(ref_text)
print(text)

For the examples added with this PR, diamond_pp_hf_twistavg_prim.py (primitive cell twist averaging) should run cleanly (at least it did for me -- please post the converter output), while diamond_pp_hf_twistavg.py (supercell twist averaging) currently fails at the converter level due to changes needed to Anouar's savetoqmcpack.

These features have bug-fixes needed at the QMCPACK/QMCPACK-converter levels (I've made Anouar aware already). This PR implements the Nexus-side features needed to drive these workflows, but does not guarantee that QMCPACK and its converters function properly.

prckent commented 21 hours ago

Thanks Jaron -- the situation is clear to me now. @anbenali How far off are the updates to savetoqmcpack? I put PySCF 2.6.2 in spack so can easily test the latest version. In would be nice to have patch so that Nexus support can be tested, but perhaps there are puzzles to solve?