QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
297 stars 138 forks source link

the wavefunction xml file does not carry the full path of vp.h5 #3948

Open anbenali opened 2 years ago

anbenali commented 2 years ago

Is your feature request related to a problem? Please describe. when optimizing a parameter (regardless of the parameter), a vp.h5 file is created and a reference to the .s0.vp.h5 is added to the opt.xml file. In case were the wavefunction definition (XML definition) is in a stand alone file called from a qmc.in.xml file, the route is just to overwrite the wfj.xml file and replace it with the best opt.xml file. However, the opt.xml now carries the call to the vp.h5 file and the global path is not indicated. The solution to avoid the code of crashing is to either delete the line or correct it by adding the absolute path.

Describe the solution you'd like

Additional context This is an issue for automated runs where script replace the wfj.xml file with opt.xml file

prckent commented 2 years ago

Can you explain more about your use case? Why would we want to use an absolute path here? The file is written to the "current" directory and the reference refers to the current directory. The assumption is that if you move your wavefunction etc. to another directory that you will also move the vp file. Can you not copy or link this file? Does Nexus not yet know about vp files? (likely) If you had a new XML jastrow or restart file you would also have to move it (?).

anbenali commented 2 years ago

When running a QMC run, we usually have a project directory containing the structure.xml, wfj.xml, wfj.H5 and pseudopotentials (if any). Then 2 subdirectories, DMC, and Optimization to avoid having 30 files from the optimization cycles and to keep the vmc and dmc runs separated from the rest. (The number of sub-directories can increase if we run different optimizations (J12 and J123) etc). When you submit your job from the optimization directory, all file locations are relative to where you submitted. Your opt.xml will be generated in the Optimization directory, but they will inherit the path to the H5 wavefunction from the wfj.xml file. In the case, even in relative path "ref=../wfj.h5". However, the path to vp.h5 will be "ref=vp.h5".

When I overwrite the wjf.xml with an optimized wavefunction, "ref=../wfj.h5" is identified perfectly, but vp.h5 is not.

This case is even worse when bundling jobs. if you have 10 twists, and you have 1 directory for the inputs to be submitted, all the opt.xml files will be in the same directory and you will have to specify the path to vp.h5 manually for each one (or delete the line from the opt.xml file).

prckent commented 2 years ago

Sounds like being able to specify the preferred name (and hence location) of the vp file would solve your use case. Perhaps this is already doable?

markdewing commented 2 years ago

There is a flag, <parameter name="output_vp_override">no</parameter>, that should turn off generating the link to the vp.h5 file in the opt.xml file.

See more in this PR:

3640

ye-luo commented 2 years ago

This is a xml issue. In the $PWD/../wfs.xml, you have to specify a path relative to $PWD instead of $PWD/... Thus you have href="../wfs.h5" when the file is at $PWD/../wfs.h5. If you put a vp.h5 at $PWD/.., the href should be ../vp.h5.

Considering we cannot predict where the file destination is desired, the current directory is the preferred option. Thus we write href="vp.h5". There is one improvement we can added. Maybe it has been implemented already. If there was override_variational_parameters and href in the original input, maybe it is better to preserve the href without changing it. Second if href vp.h5 file doesn't exist, the code should stop elegantly.

ye-luo commented 2 years ago

A second thought, I kind of like what Anouar wanted, put the absolute path of the vp.h5 file.

prckent commented 2 years ago

We need to think long term here. Considering that ever more will go in these files and the XML will be less and less relevant, I think the preference is simply to be able to specify the name and path, like all our other files. name="../../place/myfilename.h5".

ye-luo commented 2 years ago

Sorry I consider href="../../place/myfilename.h5. even worse. both relative or absolute paths work. The confusion is from what the reference directory is.

I think what matter is the default we output. How to make the workflow need minimal editing of the href line.

ye-luo commented 2 years ago
Opt/opt.xml
DMC/dmc.xml
wfs.xml
wfs.h5

Even if wfs.h5 is in the same directory as wfs.xml, it is still required to use ../wfs.h5 because include="../wfs.xml" is expanded inside Opt or DMC directories. Thus wfs.h5 is at one level up. During WFOpt run under Opt/, the s002.opt.xml and s002.vp.h5 produced in Opt/. s002.opt.xml contains href="s002.vp.h5". if copy s002.opt.xml to ../wfs.xml, a continued WFOpt run should pick up s002.vp.h5 under Opt.

The problem occurs when making a DMC run under DMC. href="s002.vp.h5" cannot be found under DMC/ because the file is under Opt/. One fix is to edit wfs.xml and change to href="../Opt/s002.vp.h5".

I would say in the case of multiple folder runs, the tool or the person who generate the workflow is responsible to make all the href to work. When there is files being copied or moved, there is nothing QMCPACK can be smart about.

Using absolute path can help. For example, edit DMC/dmc.xml change href="../wfs.xml" to href="../Opt/s002.opt.xml" or copy Opt/s002.opt.xml to wfs.xml plus href="full_path_to_vp.h5", the DMC may run.

jptowns commented 2 years ago

Perhaps unrelated, but I frequently use symlinks to create a "wfn.h5" file in the run directory of a qmc calculation. Would something similar solve this problem, too? This puts the onus on the user to specify exactly which "vp.h5" file to use, and would relieve QMCPACK from handling idiosyncracies of any particular platform.

jtkrogel commented 2 years ago

QMCPACK could be extended to understand source and destination type paths in addition to filenames. In this case, the user could provide a destination path (to an existing directory) for the outputted opt files to be written.

Nexus doesn't care about this, but some users (e.g. Anouar) apparently do. It does add flexibility without demanding that QMCPACK know anything. IMO, the default should always be to read/write from/to the current directory.

On absolute paths: they are too fragile to be used for anything by default.