exa-analytics / exatomic

A unified platform for theoretical and computational chemists
https://exa-analytics.github.io/exatomic
Apache License 2.0
19 stars 12 forks source link

Centralized quantum-code input generator and output reader #132

Open herbertludowieg opened 5 years ago

herbertludowieg commented 5 years ago

I think we could really benefit of a class that could take a universe as an input to generate inputs for any quantum-code or take file paths and read the respective outputs. Something I have been playing around with for an output reader is.

class VA:
    prog = {'gauss': gaussian.Fchk, 'gaussian': gaussian.Fchk}
    def get_gradients(self):
        path = self.path
        grad_path = path+"gradient/"
        files = os.listdir(grad_path)
        gradient = []
        for file in files:
            if file.endswith(".fchk"):
                ed = self.soft(grad_path+file)
                ed.parse_gradient()
                df = ed.gradient
                fdx = list(map(int, re.findall('\d+', file)))
                if len(fdx) != 1:
                    raise NotImplemented("Cannot determine integer from list to place label on gradient dataframe")
                df['file'] = np.tile(fdx, len(df))
            else:
                continue
            gradient.append(df)
        self.gradients = pd.concat([grad for grad in gradient]).reset_index(drop=True)
        self.gradients.sort_values(by=['file', 'label'], inplace=True)
        self.gradients.reset_index(drop=True, inplace=True)
        self.force_vector = self.gradients[['Z', 'label', 'symbol', 'frame', 'file']].copy()
        self.force_vector['vector'] = np.linalg.norm(self.gradients[['fx', 'fy', 'fz']].values, axis=1)

    def __init__(self, path, soft, temp=None, *args, **kwargs):
        self.soft = self.prog[soft]
        self.path = path
#         self.soft = soft
#         self.path = path

I tried out using a predefined dictionary where you could input a string and it looks for the key in the dict. If it's not found we could raise a NotImplemented error to safely exit. Another way I tried it out was to have a user input the output parser function to be used directly both of which worked fine in this implementation. This code is for a very specific example but we could probably generalize it so it can look in a user defined directory for the files.

tjduigna commented 5 years ago

This is a good issue. I think both approaches you tried are valid. I think I am in favor of the first approach, with a string keyword and a dict of the appropriate classes. If I were to neglect all of the pieces specific to the VA class, I think a MWE could look something like:

from exatomic import gaussian
from exatomic import molcas

def output_router(fpaths, soft, as_unis=False):
    outputs = {'gaussian': gaussian.Output,
                 'molcas': molcas.Output}
    if as_unis:
        return [outputs[soft](fpath).to_universe() for fpath in fpaths]
    return [outputs[soft](fpath) for fpath in fpaths]

def input_router(unis, soft, inp_kws=None):
    inputs = {'gaussian': gaussian.Input,
                'molcas': molcas.Input}
    if inp_kws is None:
        inp_kws = {}
    return [inputs[soft].from_universe(uni, **inp_kws) for uni in unis]

Alternatively, I could see it being more useful to just pass a directory path (and I neglect use of pathlib here but recommend it generally), e.g.:

import os

def parse_output_dir(dirpath, soft):
    # same outputs dictionary as above
    fls = [os.path.join(dirpath, fl) for fl in os.listdir(dirpath)]
    outs = []
    for fl in fls:
        try: # certainly not all files will be parsed
            outs.append(outputs[soft](fl))
        except: # make this as sophisticated as you want
            pass
    return outs

And now that I've written this out, I am reminded of some of the tuning code: https://github.com/exa-analytics/exatomic/blob/815b921256d9c064bf974f6fc1227734435d9dc1/exatomic/algorithms/delocalization.py#L213

This is also highly specific and only really pertains to a systematic file naming scheme and only aims to complete a single task. But it might give you some ideas of what not to do 😄