Multideterminants and orbitals stored in HDF5 -> No behavior when optimizing CI coeffs or orbitals coeffs.

QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support

http://www.qmcpack.org

Other

296 stars 138 forks source link

Multideterminants and orbitals stored in HDF5 -> No behavior when optimizing CI coeffs or orbitals coeffs. #444

Open anbenali opened 6 years ago

anbenali commented 6 years ago

Hello, This is a discussion on how to move the Multideterminants coefficients to HDF5.

The problem: XML cannot handle more than ~2M determinants, as the WFS file will have at least 2M lines. just reading the file may take up to 30sec.. modifying a Jastrow parameter (at the end of the file) becomes tedious.

Solution: Storing the determinants in H5. Simple, sweet and clean. I will make the read in parallel so there is no problem... However...

Issue: It might be valuable to look at the coefficients. Hdf5 is not practical for that. Also, when you optimize the determinant coefficients One might need to look at the new weights of the determinants to see if they flipped sign. Also often the weights are linked to specific Jastrows. So they should always be together. If we store the determinants,the coefficients AND the jastrows in an HDF5 file, then we lose the ability to modify the Jastrows afterwards...

Solution: Not clear what should be done... Was thinking making an implementation where we first just read the determinants coeffs and not enabling the reoptimization of the determinants in presence of Jastrows... another option would be to have the determinants in the H5 file but the coefficients in the xml file with the Jastrows? I have clearly not settled on any solution as I see advantages and flaws in all of them. So please advise.

Thanks

markdewing commented 6 years ago

Would it be feasible to write scripts or small executables to extract the coefficients, weights, jastrows, etc from the hdf file? This would allow everything to be stored in the hdf file and be kept together, but still be accessible.

jtkrogel commented 6 years ago

If coeffs are generally useful to read, QMCPACK could output them (optionally?) into a separate text file. Alternatively read access could be granted via tools/scripts as Mark suggests. If the most common use case is to scan for differences, presumably a well designed (plotting?) tool would be best.

I don't see a problem with only keeping the Jastrows in the input file, or at least I don't see an advantage to placing them in the h5.

If there is one and we do include them in the h5, then a simple solution for enabling their modification would be to allow any Jastrows that are present in the input file to take precedence over ones defined in the h5. Also, you could expand on the initialization tag, e.g. <jastrow ... init="rpa"/> goes to <jastrow ... init="h5"/>, to signal what you want taken from the h5 file.

anbenali commented 6 years ago

I like the idea of plotting the coefficients and just storing them in HDF5.

Main concern is how to force the user to use the same Jastrows used to optimize the determinants coeffs.

So far the H5 file will contain:

basisset
determinant (orbital coeffs)
Multideterminants (excitations + weight).

the basisset and the orbital will be constant (or at least until Eric's orbital reoptimization is implemented). but the MSD may be optimized. I would like to avoid having 2 hdf5 files (one for basisset + orbitals) and one for multideterminants.

But when optimizing the Jastrows and the MSD coeffs we will need to output the optimized Jastrows and MSD coeffs. Do we write them in a new separate HDF5? (it creates a lot of issues as the original wf in H5 will contain other values and then we will have to have more tags in the XML). Or do we just copy the basis set and the orbital coeffs in the new optimized files. This would solve the future similar problem when Eric's optimization is ready. But would mean having to copy a new WF each time which can create disk space issues for very large systems.

On Thu, Oct 26, 2017 at 2:33 PM, jtkrogel notifications@github.com wrote:

If coeffs are generally useful to read, QMCPACK could output them (optionally?) into a separate text file. Alternatively read access could be granted via tools/scripts as Mark suggests. If the most common use case is to scan for differences, presumably a well designed (plotting?) tool would be best.

I don't see a problem with only keeping the Jastrows in the input file, or at least I don't see an advantage to placing them in the h5.

If there is one and we do include them in the h5, then a simple solution for enabling their modification would be to allow any Jastrows that are present in the input file to take precedence over ones defined in the h5. Also, you could expand on the initialization tag, e.g. <jastrow ... init="rpa"/> goes to <jastrow ... init="h5"/>, to signal what you want taken from the h5 file.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/QMCPACK/qmcpack/issues/444#issuecomment-339776392, or mute the thread https://github.com/notifications/unsubscribe-auth/AXl6w7TW98oaNrrbW8CW8RitLyWWrjpJks5swN6QgaJpZM4QIEEL .

--

Anouar Benali, PhD Leadership Computing Facility Argonne National Laboratory Building 240 Office - 2127 9700 S Cass Av., Argonne Il, 60439 (630) 252-0058

jtkrogel commented 6 years ago

I think its fine to just have the parts/coefficients that are optimized appear in the file produced each opt step. A subsequent run would refer to the original WF h5 file and the optimized one. This is no different ffter all compared to what we already do for 3D bspline orbitals: the spline orbital h5 file is always referenced in the input and doesn't change, while the user has to refer to a specific opt.xml file following optimization. We would just be replacing opt.xml with opt.h5 in this case (with opt.h5 containing only the relevant changes from optimization).

On the other point you mention, I don't think the user should be "forced" to use any particular wavefunction component, but ones that are optimized together should remain together with the option provided to the user to override any particular part. In this case, placing Jastrow information in the opt.h5 file makes sense, with preference given to what the user requests in the xml.

anbenali commented 6 years ago

Thanks Jaron and MArk. Ye made the same comment as Jaron so I will go with that route. Once done please feel free to comment and request changes.

On Fri, Oct 27, 2017 at 7:37 AM, jtkrogel notifications@github.com wrote:

I think its fine to just have the parts/coefficients that are optimized appear in the file produced each opt step. A subsequent run would refer to the original WF h5 file and the optimized one. This is no different ffter all compared to what we already do for 3D bspline orbitals: the spline orbital h5 file is always referenced in the input and doesn't change, while the user has to refer to a specific opt.xml file following optimization. We would just be replacing opt.xml with opt.h5 in this case (with opt.h5 containing only the relevant changes from optimization).

On the other point you mention, I don't think the user should be "forced" to use any particular wavefunction component, but ones that are optimized together should remain together with the option provided to the user to override any particular part. In this case, placing Jastrow information in the opt.h5 file makes sense, with preference given to what the user requests in the xml.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/QMCPACK/qmcpack/issues/444#issuecomment-339959486, or mute the thread https://github.com/notifications/unsubscribe-auth/AXl6w6wCENWgattL7kGY2iy6eAl9wE5aks5swc6EgaJpZM4QIEEL .

--

Anouar Benali, PhD Leadership Computing Facility Argonne National Laboratory Building 240 Office - 2127 9700 S Cass Av., Argonne Il, 60439 (630) 252-0058

prckent commented 6 years ago

Lets consider the use cases:

We have a molecular or solid state system represented by a set of orbitals and multiple determinants. The orbitals are described by an LCAO expansion with potentially lots of coefficients. The multideterminant expansion is described via lots of excitation and expansion coefficients. Potentially all of these coefficients might be changed. For small systems or method development we might want to have everything in XML as today. As larger systems are studied for practical reasons we need to move some or all of these to HDF5.

A reasonable "large" use case would be a CI expansion for a molecule with 10 million coefficients. This should all be HDF5. However we then need the ability to selectively load new (say) determinant expansion coefficients that have been changed in some way.

For a small case, such as a tutorial for Li2, we might want everything to be specified in XML.

The solution seems straightforward: an HDF5 file, just like our existing XML files, should be able to act as a source of orbitals, excitations, and multideterminant expansion coefficients. An HDF5 file can contain some or all of these coefficients. i.e. Use of HDF5 or XML is transparent and we don't force any particular workflow.

In the wavefunction definition it has to be possible to specify these different sources. A obvious convenience feature is that by default, everything is taken from a single HDF5 or XML file.

The way forward is to prototype how these different inputs would look and get comments.

prckent commented 6 years ago

@anbenali Please prototype some examples (inputs, not code), so that we can pick an optimal route.

anbenali commented 4 years ago

I am commenting again so this comes up!

Things have changed a bit as can be seen since there is no more all XML (or at least it is not desirable).

I am partisan of having MO-Coeffs, CI-Coeffs and Jastrows in one file (H5). But as much as we learned to live without the CI-coeffs visible in the XML wfj file, I am not sure I want to hide the jastrows. When reoptimizing CI-coeffs they need to be matched with the Jastrows...

Current implementation allows for reading optimized CI coeffs from a different H5 files; Imagine you have 1M determinants, you optimize 1000 and store the ci coeffs in a new H5 file. you can specify that path and the first 1k determinants will be read from one H5 file and the following will be read from the original h5 file. the issue is that the Jastrow are in different xml file.

One solution would be to store the Jastrows in the H5 file and in the XML and at the beginning, we can test the jastrows in the H5 against the xml.

prckent commented 4 years ago

What are the requirements here? I think they are that we can save (checkpoint) the Jastrow, CI, and orbital coefficients at any point, and of course reload them in future. For example, during iterations of the optimizer so that the run can be restarted.

While we support it currently, reading coefficients of the same category from multiple files seems like a terrible idea. Why not store them all in one file along with the "optimize or not" flags? I don't recall how we got to the current status - speed & convenience ? - but why not simplify at this point?

I think we have code to write jastrows in XML but not HDF5. However they are small. Orbital and CI coefficients can be sizable. Why not Jastrows in XML using the existing code and orbital and CI coefficients (the Fermionic part) in HDF5? Label them by section and iteration number?

anbenali commented 4 years ago

My only fear is if we use wrong jastrow than compared to what was reoptimized.

Imagine you reoptimize CI and Jastrow and Orbitals, then in the H5 you get a good pair CI orbitals but you make a mistake a use the wrong jastrow... spring also the jastrow in the H5 but using also the XML and we make sure the xml and H5 match could fixe the issue.

Now if we do this we will have still one hdf5 containing the original WF with the basis set, one H5 containing the optimized ci-coeffs and orbitals and one xml file containing the jastrows.... Seems a lot... we could clone completely the H5 and overwrite updated datasets... that will make the number of files needed identical to what we are already using.

On Thu, Aug 20, 2020, 09:39 Paul R. C. Kent notifications@github.com wrote:

What are the requirements here? I think they are that we can save (checkpoint) the Jastrow, CI, and orbital coefficients at any point, and of course reload them in future. For example, during iterations of the optimizer so that the run can be restarted.

While we support it currently, reading coefficients of the same category from multiple files seems like a terrible idea. Why not store them all in one file along with the "optimize or not" flags? I don't recall how we got to the current status - speed & convenience ? - but why not simplify at this point?

I think we have code to write jastrows in XML but not HDF5. However they are small. Orbital and CI coefficients can be sizable. Why not Jastrows in XML using the existing code and orbital and CI coefficients (the Fermionic part) in HDF5? Label them by section and iteration number?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/QMCPACK/qmcpack/issues/444#issuecomment-677706386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF4XVQY3DVJUO4RM7J32JQDSBUYSZANCNFSM4EBAIEFQ .