Complex, ordered multi-command input for Molpro

mattwelborn commented 4 years ago

Is your feature request related to a problem? Please describe. Molpro input files are very complex, and depend on an imperative programming structure. For example, CCSD can only be performed after a HF command. This leads to very complicated input files, such as the following:

memory,300,m

symmetry,nosym
geometry=geo.xyz

basis=cc-pVDZ

gthresh,ENERGY=1e-10
{df-rhf,;save,2100.2}

{ibba,antibonds=1,THRLOC_IB=1e-12;freezecore;orbital,2100.2}
{locali,boys;print,orbital;fock,2100.2;order,type=fock}

{df-lccsd(t),chgfrac=1.0,canblk=0;core,6}

I want a way to specify this input file inside an AtomicInput.

Describe the solution you'd like A format for specifying commands, in order. Each command must support args, kwargs, and directives. Directives themselves have args and kwargs. args and kwargs are not ordered. Directives are not ordered. (@sjrl please verify this last point).

One possible format would be:

keywords["commands"]: List[Commands]

class Directive: 
    name: str
    args: Optional[List[Union[str,int,bool,float]]]
    kwargs: Optional[Dict[str, Union[str,int,bool,float]]

class Command(Directive): 
    directives: Set[Directive]

Under this structure, the above input file would become:

method = "df-lccsd(t)"
basis = "cc-pVDZ"  # note that this replaces the basis command
molecule = Molecule.from_file("geo.xyz")
keywords = {
    "memory": 300,
    "symmetry": "nosym",
    "gthresh": "energy=1e-10",  # gthresh is weird, and this is a lame solution
    "commands": [
        {"name": "df-rhf", "directives": {{"name": "save", "args": ["2100.2"]}}},
        {
            "name": "ibba",
            "kwargs": {"antibonds": True, "thrloc_ib": 1e-12},
            "directives": {
                {"name": "freezecore"},
                {"name": "orbital", "args": ["2100.2"]},
            },
        },
        {
            "name": "locali",
            "args": ["boys"],
            "directives": {
                {"name": "print", "args": ["orbital"]},
                {"name": "fock", "args": ["2100.2"]},
                {"name": "order", "kwargs": {"type": "fock"}},
            },
        },
        {
            "name": "df-lccsd(t)",
            "kwargs": {"chgfrac": 1.0, "canblk": 0},
            "directives": {{"name": "core", "args": [6]},},
        },
    ],
}

Describe alternatives you've considered

Template files (current solution).
dunder representation of nested dictionaries. Will lead to extremely long strings, and does not have the necessary ordered property of commands.

@loriab @sjrl

sjrl commented 4 years ago

Directives are not ordered. (@sjrl please verify this last point).

That is correct, directives of a given command are not ordered. As a side tangent directives can be placed on new lines instead of the semicolon separation to partially alleviate the limit on the width of Molpro's input.

sjrl commented 4 years ago

I like this idea a lot! I only have a few questions and comments.

First I think we can continue to handle symmetry and memory through the existing AtomicInput (handled through molecule and build_input() respectively), which would only leave us with gthresh and commands. I like the commands structure because we can easily check for the presence of that keyword to replace the method writing in build_input().

gthresh is definitely a little trickier, but it is an example of Molpro directive that is specified without the need of a command. So we could potentially have a directives keyword that lives outside of commands and we could throw gthresh into that. How does that sound?

mattwelborn commented 4 years ago

I think directives is a good solution to gthresh. So keywords["directives"]: Optional[Set[Directive]]?

sjrl commented 4 years ago

So keywords["directives"]: Optional[Set[Directive]]?

Yup! So the only thing that this structure wouldn't support is the ability to set directives in between commands. In some complicated inputs (I believe @loriab has some examples) Molpro sets directives in between commands.

Here is a quick example for a PNO calculation in Molpro

{df-hf}
{local,distp=8,maxit_lmp2=1,thrpno_en_lmp2=0.998,thrpno_occ_lmp2=0,thrdist=0,fitmos=0,fitpao=0,thrpno_dist=0.998,keep_diag=0}
{cfit,locfit=0}
{pno-lmp2,distp=0}

sjrl commented 4 years ago

I guess in my above example, we could just use the commands structure and do something like

"commands": [
    {"name": "df-hf",},
    {
      "name": "local",
      "kwargs": {"distp": 8, "maxit_lmp2": 1, "thrpno_en_lmp2": 0.998, ...}
    },
    {
      "name": "cfit",
      "kwargs": {"locfit": 0}
    },
    {
      "name": "pno-lmp2",
      "kwargs": {"distp": 0}
    }
]

if you wanted to specify directives in between commands.

mattwelborn commented 4 years ago

We could explicitly say keywords["commands"]: List[Commands, Directives] just to make things abundantly clear.

Another question is whether method should do anything other than just provide a convenient summary keyword. For example, we could require that the last command's name is the same as method.

dgasmith commented 4 years ago

I would vote for method == commands[-1]["name"]. This would make the return_result make at least some sense.

mattwelborn commented 4 years ago

@sjrl Any weird molpro edge cases that will break this? I'm okay with not supporting proc, force, weirdly-placed GDIRECT, to name a few...

sjrl commented 4 years ago

Hmm I do wonder, how is this meant to interact with the driver call? If the driver is gradient does that mean force should still be added after all commands are written?

And for requiring method == commands[-1][name] this would mean we would have methods like df-mp2 or ump2, which have modifiers in front of the actual method. I believe we were trying to avoid this by having to specify keywords instead. But in the situation that commands is provided is it alright to relax on this rule?

To clarify one more point. Does the presence of commands mean that the input builder ignores all other keywords related to the method such as reference = unrestricted?

Otherwise everything looks good to me! I can't think of any other edge cases and I agree proc and GDIRECT can be left alone.

mattwelborn commented 4 years ago

Hmm I do wonder, how is this meant to interact with the driver call? If the driver is gradient does that mean force should still be added after all commands are written?

Yes. Would this also work for hessians?

And for requiring method == commands[-1][name] this would mean we would have methods like df-mp2 or ump2, which have modifiers in front of the actual method. I believe we were trying to avoid this by having to specify keywords instead. But in the situation that commands is provided is it alright to relax on this rule?

After looking at various programs (and especially GAMESS), @dgasmith and I have moved toward letting method just be whatever the program wants it to be called. So if Q-Chem calls density-fitted MP2 "RI-MP2", and Molpro calls it "df-rmp2", and entos calls it simply "mp2", then that's okay. It's too much work -- and potentially impossible -- to make the methods program-independent. The only place I would be careful is that I would like to normalize known aliases within a program. For example, "HF" and "RHF" do the same thing in Molpro.

To clarify one more point. Does the presence of commands mean that the input builder ignores all other keywords related to the method such as reference = unrestricted?

Non-command keywords would get translated to key=value at the top of the file. AFAIK reference is a special keyword present only in MolproHarness's build_input and would be removed in favor of specifying spin ansatz in the command name.

Otherwise everything looks good to me! I can't think of any other edge cases and I agree proc and GDIRECT can be left alone.

👍

sjrl commented 4 years ago

Would this also work for hessians?

I believe so, although I don't have much experience with hessian calculations in Molpro.

After looking at various programs (and especially GAMESS), @dgasmith and I have moved toward letting method just be whatever the program wants it to be called. So if Q-Chem calls density-fitted MP2 "RI-MP2", and Molpro calls it "df-rmp2", and entos calls it simply "mp2", then that's okay. It's too much work -- and potentially impossible -- to make the methods program-independent. The only place I would be careful is that I would like to normalize known aliases within a program. For example, "HF" and "RHF" do the same thing in Molpro.

Okay gerat! That all sounds good to me.

Non-command` keywords would get translated to key=value at the top of the file. AFAIK reference is a special keyword present only in MolproHarness's build_input and would be removed in favor of specifying spin ansatz in the command name.

That is correct reference is something I added back when ump2 was not preferred. I'm happy to have that removed!

mattwelborn commented 4 years ago

Okay, it looks like we've reached consensus on this.

loriab commented 4 years ago

Yeah, sorry, I went to work on these new ideas last week, then got distracted by other projects. I don't have a final proposal, so here are some points.

definitely agree that need another construct for ordered commands
After looking at various programs (and especially GAMESS), @dgasmith and I have moved toward letting method just be whatever the program wants it to be called. So if Q-Chem calls density-fitted MP2 "RI-MP2", and Molpro calls it "df-rmp2", and entos calls it simply "mp2", then that's okay. It's too much work -- and potentially impossible -- to make the methods program-independent. The only place I would be careful is that I would like to normalize known aliases within a program. For example, "HF" and "RHF" do the same thing in Molpro.

I'll interject that I see this as two issues. I totally agree that it's not qcng/qcsk's role to make the methods program-independent -- that is, mp2 in prog A is same mtd/answer as mp2 in prog B. I do see it as a proper role to defend the data layout of QCSchema that only method info appears in model.method and that routing info (e.g., tce) or algorithm info (e.g., ri) be slotted in the grab bag that is keywords (or in an extras prefix/suffix field of model maybe if that's convenient for reconstituting).

here's another molpro input to consider

GTHRESH,ZERO=1.e-14,ONEINT=1.e-14,TWOINT=1.e-14,ENERGY=1.e-8,ORBITAL=1.e-8,GRID=1.e-8

basis={
set,orbital; default,aug-cc-pVTZ
set,jkfit;   default,aug-cc-pVTZ/jkfit
set,dflhf;   default,aug-cc-pVTZ/jkfit
set,mp2fit;  default,aug-cc-pVTZ/mp2fit
}

gdirect
ga=1101.2; gb=1102.2
ca=2101.2; cb=2102.2

dummy,14,15,16,17,18,19,20,21,22,23,24,25,26
{df-hf,basis=jkfit,locorb=0; start,atdens; save,$ga}
{df-ks,lhf; dftfac,1.0; start,$ga; save,$ca}
eehfa=energy; sapt;monomerA

dummy,1,2,3,4,5,6,7,8,9,10,11,12,13
{df-hf,basis=jkfit,locorb=0; start,atdens; save,$gb}
{df-ks,lhf; dftfac,1.0; start,$gb; save,$cb}
eehfb=energy; sapt;monomerB

{sapt,SAPT_LEVEL=3;intermol,ca=$ca,cb=$cb,icpks=0,fitlevel=3,nlexfac=0.0,cfac=0.0
dfit,basis_coul=jkfit,basis_exch=jkfit,cfit_scf=3}
eedisp=E2disp

Below is my rough expression of the above in "dunder" format. An obvious weakness is the handling of options associated with repeated commands.



                # commands: [
                #   gthresh,
                #   gdirect,
                #   {ga: 1101.2},
                #   {gb: 1102.2},
                #   {ca: 2101.2},
                #   {cb: 2102.2},
                #   dummy,
                #   df-hf,
                #   df-ks,
                #   {eehfa: energy}
                #   sapt

GTHRESH ,ZERO=1.e-14 # gthreshzero: 1.e-14 ,ONEINT=1.e-14 # gthreshoneint: 1.e-14 ,TWOINT=1.e-14 # gthreshtwoint: 1.e-14 ,ENERGY=1.e-8 # gthreshenergy: 1.e-8 ,ORBITAL=1.e-8 # grhreshorbital: 1.e-8 ,GRID=1.e-8 # gthreshgrid: 1.e-8

basis={ set,orbital; default,aug-cc-pVTZ set,jkfit; default,aug-cc-pVTZ/jkfit set,dflhf; default,aug-cc-pVTZ/jkfit set,mp2fit; default,aug-cc-pVTZ/mp2fit }

gdirect ga=1101.2; gb=1102.2 ca=2101.2; cb=2102.2

dummy # dummy: [14, 15, 16, ... 26] ,14 ,15 ,16,17,18,19,20,21,22,23,24,25,26

{df-hf ,basis=jkfit # see below ,locorb=0 ; start ,atdens ; save ,$ga}

{df-ks # see below ,lhf ; dftfac ,1.0 ; start ,$ga ; save ,$ca}

eehfa=energy; sapt;monomerA

dummy,1,2,3,4,5,6,7,8,9,10,11,12,13

{df-hf ,basis=jkfit # df-hfbasis: jkfit ,locorb=0 # df-hflocorb: 0 ;start ,atdens # df-hfstart: atdens # or [atdens] ;save ,$gb} # df-hfsave: $gb

{df-ks ,lhf # df-ks ;dftfac
,1.0 # df-ks__dftfac: 1.0 # or [1.0] ;start ,$gb # df-ksstart: $gb ;save ,$cb} # df-ks___save: $cb

eehfb=energy; sapt;monomerB

{sapt ,SAPT_LEVEL=3 # saptsaptlevel: 3 ;intermol ,ca=$ca # saptintermolca: $ca ,cb=$cb # saptintermolcb: $cb ,icpks=0 # saptintermolicpks: 0 ,fitlevel=3 # saptintermolfitlevel: 3 ,nlexfac=0.0 # saptintermolnlexfac: 0.0 ,cfac=0.0 # saptintermolcfac: 0.0 ;dfit # saptdfitbasis_coul: jkfit ,basis_coul=jkfit # sapt_dfitbasis_exch: jkfit ,basisexch=jkfit # saptdfit__cfit_scf: 3 ,cfit_scf=3}

eedisp=E2disp



* Sorry to be pushing back a bit against the perfectly good repr of complex inputs proposed in this issue. I worry that it violates the below from the infamous #70. By expressing the command and full options set all together, it makes it hard to specify or change a single option using a syntax that works across a variety of programs.

> 2. Keywords should be independent and granular such that they're approximately 1:1 with
other programs, not 5:1. That is, no single method option should cover convergence, algorithm, and target roots all wrapped together. Also, separate, yet mutually exclusive, options sets are a last resort (e.g., three independent boolean options for rhf/uhf/rohf).

loriab commented 4 years ago

from above

"commands": [
{"name": "df-hf",},
{
"name": "local",
"kwargs": {"distp": 8, "maxit_lmp2": 1, "thrpno_en_lmp2": 0.998, ...}
},
{
"name": "cfit",
"kwargs": {"locfit": 0}
},
{
"name": "pno-lmp2",
"kwargs": {"distp": 0}
}
]

I presume the user in your plan enters model.method = 'pno-lmp2', and the df-hf, cfit, and pno-lmp2 commands+directives get filled in by the write_input in the harness. If the user wants to tweak maxit_lmp2 (or some other piece they're methodologically allowed to tweak), do they have a whole

{
      "name": "local",
      "kwargs": {"maxit_lmp2": 2}
    },

section? Is the harness prepared to handle that that's not the full directive but a delta?

sjrl commented 4 years ago

I think the plan for the above use of the commands keyword is to completely replace the model.method when writing the input file. So a kind of replacement for current template implementation for the Molpro Harness. And for the very good point you bring up with issue https://github.com/MolSSI/QCEngine/issues/70 I don't see how we can satisfy this goal and also have templates. But perhaps I am missing something?

The delta example you provide I believe can be made a separate task where the commands keyword is not provided, and then we rely on write_input on knowing how to handle the method pno-lmp2 and what keywords it supports.

MolSSI / QCEngine

Complex, ordered multi-command input for Molpro #198