SCM-NV / qmflows

This library tackles the construction and efficient execution of computational chemistry workflows
Other
45 stars 9 forks source link

Parsing output files #24

Closed ridderl closed 7 years ago

ridderl commented 7 years ago

The properties.json files should have the following format:

{property: {
    "parser": "awk" or "python" or "kfreader",
    "file_ext": "out" or "dat" or "hess" etc,
    "function": "some awk script" or "name_of_a_function_in_parser.py" or ["section", "property"] in t21
}}
ridderl commented 7 years ago

I'm implementing the special keyword inithess for the ADF package. My idea was to write a inithess.t21 to the ADFjob's plamsfolder with just the hessian property stored in it, which can be read by ADF. The problem is that the plamsfolder doesn't exist yet when the "handle_specific_keyword" is run.

My current solution is to write a temporary tape21 file to the parent plams working folder (=builtins.config.jm.workdir). Better ideas are welcome ....

felipeZ commented 7 years ago

I have added a Parser for a molecule in Gamess together with some properties

felipeZ commented 7 years ago

In gamess I'm defining the getattr method using importlib. The idea is that we load dynamically the module and the function used for parsing. These function and module names are stored in the properties dictionary as follows:

{
    "frequencies": {"module": "qmworks.parsers.gamess_parser", "function": "parse_frequencies", "file": "dat_file"},
    "hessian": {"module": "qmworks.parsers.gamess_parser", "function": "parse_hessian", "file": "dat_file"},
    "gradient": {"module": "qmworks.parsers.gamess_parser", "function": "parse_gradient", "file": "dat_file"},

Shall we extend this idea to Orca and the other packages?

ridderl commented 7 years ago

Do we need the new key "module" ? We still want to enable parsing also by AWK or kfreader, right? Maybe you can use the key "parser" instead of "module", and the value can be just "games_parser". The mechanism could be that if the parser is not "awk" or "kfreader" etc., it will then try to import the given parser from qmworks.parsers and call the given function.

felipeZ commented 7 years ago

the importlib mechanism is more general, in the sense that it loads whatever python function or method from python. For instance in the case of AWK, we can use

{ 
  "some_prop" :  {"module": "qmworks.packages.packages", "class":  "Results", "method":  " awk_file, "args":  "some_args"}
}
ridderl commented 7 years ago

OK, I see your point about a general mechanism. We could still make it somewhat more human-readable if we require all the code for parsing to be part of qmworks.parsers. That would be good practice anyway and then we can skip that part in the value for "module". So we could put awk_file in qmworks.parsers.generic. The same holds for the kfreader. So something like:?

{
   "some_prop": {"parser": "generic", "function": "awk_file", "args": ["/some_string/ {print $4}"]}
}
felipeZ commented 7 years ago

That's a good idea. We can move the awk_file parser and the rest of the reader to the parser module and make an interface for them.

felipeZ commented 7 years ago

Following Lars' Suggestion the Properties in Gamess (and in the future for the rest of the packages) looks like

{
    "frequencies": {"parser": "gamess_parser", "function": "parse_frequencies", "file": "dat"},
    "hessian":     {"parser": "gamess_parser", "function": "parse_hessian",     "file": "dat"},
    "gradient":    {"parser": "gamess_parser", "function": "parse_gradient",    "file": "dat"},
    "dipole":      {"parser": "gamess_parser", "function": "parse_dipole",      "file": "dat"},
    "molecule":    {"parser": "gamess_parser", "function": "parse_molecule",    "file": "out"},
    "total_energy": {"parser": "gamess_parser", "function": "parse_total_energy", "file": "out"}
}
felipeZ commented 7 years ago

The new mechanism to retrieve the properties is now implemented in the package module. The get_property method of the class Results is implemented as:

    def get_property(self, prop):
        """
        Look for the optional arguments to parse a property, which are stored
        in the properties dictionary.
        """
        # Read the JSON dictionary than contains the parsers names
        ds = self.prop_dict[prop]

        # extension of the output file containing the property value
        file_ext = ds['file_ext']

        # If there is not work_dir returns None
        work_dir = lookup(self.archive, 'work_dir')

        # Plams dir
        plams_dir = self.archive['plams_dir'].path

        # Search for the specified output file in the folders
        file_pattern = '{}.{}'.format(self.job_name, file_ext)

        output_files = concatMap(partial(find_file_pattern, file_pattern),
                                 [plams_dir, work_dir])
        if output_files:
            file_out = output_files[0]
            fun = getattr(import_parser(ds), ds['function'])
            # Read the keywords arguments from the properties dictionary
            kwargs = lookup(ds, 'kwargs')
            kwargs['plams_dir'] = plams_dir
            return ignored_unused_kwargs(fun, [file_out], kwargs)
        else:
            msg = "There is not output file called: {}.\n".format(file_pattern)
            raise FileNotFoundError(msg)

First the output file is searched both at the plams_dir or the work_dir. Then using the importlib library the parser function is loaded during the execution. Finally the parser function (called fun) is called using the output file and optional arguments. The ignored_unused_kwargs ignores all the optional keyword arguments that are not used by the parser.

felipeZ commented 7 years ago

Now the definition of the method __getattr__ is defined in the Results class.

felipeZ commented 7 years ago

ADF, DFTB, ORCA, GAMESS passed test with new parsing scheme. Now both the __getattr__ and get_property methods are implemented in the Results class

ridderl commented 7 years ago

The gamess test crashed on my laptop. It looks like the structure to be read should be methanol.xyz, not ion_methanol.xyz. Fixed that, but the test still fails. I think it is not a good idea to check final coordinates. The calculation may end up with slightly different xyz coordinates and still be correct. Better to test on final energy, or dipole or so. Also, I would run without solvent, this makes the test slower.

felipeZ commented 7 years ago

I took it directly from the gamess test set

 $contrl scftyp=rhf runtyp=optimize nzvar=12 $end
 $system timlim=2 mwords=2 $end
 $pcm    solvnt=water $end
 $basis  gbasis=n31 ngauss=6 ndfunc=1 $end
 $guess  guess=huckel $end
 $zmat   izmat(1)=1,1,2,  1,2,3,  1,3,4,  1,3,5,  1,3,6,
            2,1,2,3,  2,2,3,4,  2,2,3,5,  2,2,3,6,
            3,1,2,3,4,  3,1,2,3,5,  3,1,2,3,6 $end
 $statpt opttol=1d-5 $end
 $data
Methanol in PCM water...starting at gas phase geom
Cs

H 1.0   -1.0616171503   0.8036449245   0.0000000000
O 8.0   -0.6870131482  -0.0653470836   0.0000000000
C 6.0    0.7093551399   0.0291827007   0.0000000000
H 1.0    1.0836641283   0.5408321444   0.8835398105
H 1.0    1.0975386849  -0.9797829903   0.0000000000

For me it is working in my local computer.

ridderl commented 7 years ago

On my side it fails and it makes sense it does: the molecules should be either charged or be a radical. Or it should have an extra hydrogen. Let me fix it in master and then I will merge into develop.

felipeZ commented 7 years ago

Have you seen what is the error? With gamess you can't rerun a calculation. You need first to remove the *dat file in the scratch.

ridderl commented 7 years ago

Yes I checked that. The error was that the number of electrons (17) didn't match with a neutral molecule.

felipeZ commented 7 years ago

Have you check the input that you are generating with QMworks? Do you remember the discussion that we have about symmetry in Plams? Point is that Gamess use the symmetry to generate the input molecule. in this case the symmetry label is Cs because one of the hydrogen can be generated by rotations. I have added support for symmetry in plams as we have discussed here: https://github.com/SCM-NV/PLAMS/pull/5

Problably you have a Plams version that does not take into account the symmetry so you are using C1, therefore you are missing an hydrogen.

It was my fault to call the initial structure ion_methanol, I should have call it methanol_Cs.

ridderl commented 7 years ago

The input had 3 hydrogens (like in the output you provided above). It should have 4. Is plams supposed to add that hydrogen?

felipeZ commented 7 years ago

the input generate by Plams looks like:

 $data
title
Cs

H 1    -1.0616171503  0.8036449245  0.0000000000
O 8    -0.6870131482 -0.0653470836  0.0000000000 
C 6     0.7093551399  0.0291827007  0.0000000000 
H 1     1.0836641283  0.5408321444  0.8835398105 
H 1     1.0975386849 -0.9797829903  0.0000000000 

I am sure that you are using a C1 symmetry group. The newer version of Plams support symmetry and Gamess takes care of generating the other hydrogen.

ridderl commented 7 years ago

?? The input above doesn't seem complete ...

felipeZ commented 7 years ago

In Gamess if you have a molecule with a given symmetry different of C1, you need only to provide one part of the molecule, the rest is automatically constructed by gamess

felipeZ commented 7 years ago

According to Gamess manual:

NOSYM = 0 the symmetry specified in $DATA is used
                    as much as possible in integrals, SCF,
                    gradients, etc. (this is the default)

              = 1 the symmetry specified in the $DATA input
                    is used to build the molecule, then
                   symmetry is not used again. Some GVB
                   or MCSCF runs (those without a totally
                   symmetric charge density) require you
                   request no symmetry.
ridderl commented 7 years ago

OK, now it works for me too.

ridderl commented 7 years ago

Small point: gamess has generic property total_energy where other packages have energy. Is it ok to use 'energy' for games too?

ridderl commented 7 years ago

Same for hess in orca, is hessian in other packages

felipeZ commented 7 years ago

Yes, I think that we should keep the name that is used most. Shall we change the name of the parsers or only the properties?