QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
292 stars 137 forks source link

What should be in the example input files generated by the molecular converter? #448

Open anbenali opened 6 years ago

anbenali commented 6 years ago

Hello,

The converter now generates automatically an input File containing

Ref for ptcl.xml Ref for wfs.xml Hamiltonian (by default, assumes BFD ECP. But there is a warning in the Comments.) Generic Optimization block VMC Block DMC Block.

I added Comments before each block explaining what the block does. (could not attach it as xml so had to convert it to pdf).

Let me know what you think before I push the code.

sample.Input.pdf

jtkrogel commented 6 years ago

Good time to change the default ptcl/wfs files (e.g. "Gaussian-G2" refers to the G2 test set (?)).

Is the converter aware of whether the e.g. GAMESS run is all electron or PP? If so it should populate the electron-ion terms appropriately.

I would also suggest replacing "C.BFD.xml" with text suggesting that something is missing, e.g. "YOUR PP FILE FOR C", etc. Some might get the wrong impression that BFD files are the right thing to provide when they are not (despite the warning in comments). Alternatively, if the converter can be made intelligent enough to tell what was used in the prior calculation, it could actually populate these fields intelligently rather than requiring additional text editing by the user.

Suggest reducing targetwalkers to ~2000. We really need a better (e.g. more automatic) way of setting the number of walkers, as appropriate to the run (workstation vs cluster vs large supercomputer, gpu's, etc).

prckent commented 6 years ago

The input files produced by the converter are among the first points of contact users have with QMCPACK. We should discuss a little about what the aims are and what is reasonable to achieve at this time. Being friendly+useful here is very important. I'll comment further later, hopefully after others have commented.

anbenali commented 6 years ago

Hello,

Yes please let me know what you like or don't. Changes are trivial to make but I cannot guess what is he consensus.

Good time to change the default ptcl/wfs files (e.g. "Gaussian-G2" refers to the G2 test set (?))

No idea. We could indeed drop the G2.

Is the converter aware of whether the e.g. GAMESS run is all electron or PP? If so it should populate the electron-ion terms appropriately.

Yes it is. At this point, this is not related to Gamess, but every format we support. A variable ECP is passed and the Hamiltonian is generated accordingly.

I would also suggest replacing "C.BFD.xml" with text suggesting that something is missing, e.g. "YOUR PP FILE FOR C", etc. Some might get the wrong impression that BFD files are the right thing to provide when they are not (despite the warning in comments). Alternatively, if the converter can be made intelligent enough to tell what was used in the prior calculation, it could actually populate these fields intelligently rather than requiring additional text editing by the user.

The only warning is for the ECP name. I could remove the BFD default, but my guess was that new learners will try with BFD. If they are not using BFD and cannot guess that they need to specify the ECP somewhere, despite having a warning, they should probably not run QMCPACK and attend our summer school. Having something automated to convert the PP from the outputfile is tedious and to be honest will imply a lot of work on the converter which at the end of the day is just a converter. The user can make the effort of at least putting the right file in. If you think it is confusing I rather go with "PATH AND NAME OF ECP"

Suggest reducing targetwalkers to ~2000. We really need a better (e.g. more automatic) way of setting the number of walkers, as appropriate to the run (workstation vs cluster vs large supercomputer, gpu's, etc) Totally agree.. I thought about it for 5 minutes then I put a BS number for now. I will see what people think we should put rather than solving the problem myself. People have different goals (high accuracy with long runs or high accuracy with many samples).. we cannot accommodate by default everyone and I am biased...

Ideally we would have a tag: target accuracy and the dryrun would allow to set the rest...

Please keep commenting

Anouar

On Mon, Oct 30, 2017 at 8:21 AM, jtkrogel notifications@github.com wrote:

Good time to change the default ptcl/wfs files (e.g. "Gaussian-G2" refers to the G2 test set (?)).

Is the converter aware of whether the e.g. GAMESS run is all electron or PP? If so it should populate the electron-ion terms appropriately.

I would also suggest replacing "C.BFD.xml" with text suggesting that something is missing, e.g. "YOUR PP FILE FOR C", etc. Some might get the wrong impression that BFD files are the right thing to provide when they are not (despite the warning in comments). Alternatively, if the converter can be made intelligent enough to tell what was used in the prior calculation, it could actually populate these fields intelligently rather than requiring additional text editing by the user.

Suggest reducing targetwalkers to ~2000. We really need a better (e.g. more automatic) way of setting the number of walkers, as appropriate to the run (workstation vs cluster vs large supercomputer, gpu's, etc).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/QMCPACK/qmcpack/issues/448#issuecomment-340441468, or mute the thread https://github.com/notifications/unsubscribe-auth/AXl6wxExiwgwz_tfKBmK0RsYtfVCegjgks5sxc1WgaJpZM4QJtpp .

--

Anouar Benali, PhD Leadership Computing Facility Argonne National Laboratory Building 240 Office - 2127 9700 S Cass Av., Argonne Il, 60439 (630) 252-0058

jtkrogel commented 6 years ago

Actually gaussian pseudopotential conversion is easier than for orbitals (simple evaluation on a linear grid, very few terms). This was not the original intent of my comment, but I think it is sufficiently straightforward to implement (50-100 lines max) and a sufficient benefit to easily making correct workflows that we should do it.

The DMC walker issue is straightforward enough that we should automate it (i.e. no user input required to QMCPACK --not the converter-- to get reasonable behavior by default). Similar for VMC.

markdewing commented 6 years ago

I second Jaron's request that, at least for all-electron calculations, the Hamiltonian section be completely filled out.

Will there be options to control what goes in the output? Currently there are -noJastrow and -add3BodyJ to control the wavefunction. It might be nice to have an option to create a file with just a VMC section, just optimization, or just VMC/DMC. The default number of walkers and blocks should tend to the small side. It's easy to do a short run, see how long it takes, and then scale up. Also, for new users, the feedback of the code finishing quickly is better (vs sitting there and them wondering if it is working or not).

There could be more comments on which parameters are the important ones to adjust in each section, along with a pointer to the right chapter in the manual.

The help output should also be expanded and improved, to give more guidance on what the options do, and what the user should do with the output. it could maybe be split into short usage message, and a longer message with a -h or -help option. (Not necessarily for this check-in, though)

anbenali commented 6 years ago

I second Jaron's request that, at least for all-electron calculations, the Hamiltonian section be completely filled out.

As mentioned in previous emails, This is already set.

Will there be options to control what goes in the output? Currently there are -noJastrow and -add3BodyJ to control the wavefunction. It might be nice to have an option to create a file with just a VMC section, just optimization, or just VMC/DMC.

I thought about this, but rather by creating 2 separate files. One for the Optimization and 1 for the VMC/DMC. Another option could be to not generate the files at all with an option -noInput.

The default number of walkers and blocks should tend to the small side. It's easy to do a short run, see how long it takes, and then scale up. Also, for new users, the feedback of the code finishing quickly is better (vs sitting there and them wondering if it is working or not).

Sure for the VMC and DMC as mentioned by Jaron. For the optimization, too small samples tend to break completely the optimization and multiple NaN show up. I rather avoid those cases.

There could be more comments on which parameters are the important ones to adjust in each section, along with a pointer to the right chapter in the manual.

To be honest, even at this level of comments, too many comments make the file unreadable. We need to make sure there are not too many comments but I like the idea of the reference to the book chapter. Maybe what we need is a default "verbose" mode where all comments are in, and a no comments if the option -NoVerbose is added. As an advanced user I don't want to be polluted with details. I just want the input.

The help output should also be expanded and improved, to give more guidance on what the options do, and what the user should do with the output. it could maybe be split into short usage message, and a longer message with a -h or -help option

Agreed. At this point we want just an input to do the science in a more secure way. If people want more info hey can look at the manual.

prckent commented 6 years ago

Possible aims here

  1. Provide a set of inputs that could be run without further editing, not obtaining good statistics, but at least running to completion in a timely manner for small problems.
  2. Give example jastrow settings (1-2, 1-2-3 body variants).
  3. Highlight the important/recommended options for the main QMC algorithms.

We are not attempting to generate a set of idealized inputs that always work. That is a research project with many dependencies.

I suggest we call the pseudopotentials by a generic name that could actually be used, not “YOU_MUST_SPECIFY_THIS_FILE.xml”. e.g. Ti_pp.xml (Note that CASINO uses/used Ti_pp.data). If we later improve the converter to tabulate ECPs, this can be automated. We should not recommend any particular brand of pseudopotentials at this time.

My suggestions are that we create two set of wavefunctions and several QMCPACK inputs.

  1. A determinant(s) only wavefunction and a QMCPACK VMC input that reads it. This could actually be run and used to verify that the correct HF/MCSCF/CI energy is obtained. I would like to encourage this practice with beginners.
  2. A wavefunction containing default one and two body jastrows, with a three body jastrow commented out. The QMCPACK inputs that read this could contain some combination of VMC, optimization and DMC. My initial suggestion is that we write vmc, opt+vmc, and dmc variants in separate files. This is the order that most investigations naturally follow. Since DMC is costly and a good wavefunction is needed, I do not think that the DMC run needs to work by default. E.g. It could refer to a wavefunction.opt.xml that does not exist, making it explicit that the user is expected to pick a good wavefunction.

As regards the content of the QMC inputs, we should have a short run by default. E.g. 20 blocks of 100 steps, or 5 optimization cycles of the one shift optimizer etc. For a small molecule these will be fast to run to completion. It might be a good idea if we make sure that the inputs are sensible for (say) a water molecule or one of our other tutorial examples.

I do not think we should have lots of options to the converter. Perhaps enabling J3 and the determinant coefficient cutoff.