geem-lab / pnictogen

⚛️ A Python library and a command-line tool that creates input files for computational chemistry packages
https://pypi.org/project/pnictogen/
MIT License
18 stars 4 forks source link

Which data model for molecule is correct/best? #5

Closed schneiderfelipe closed 5 years ago

schneiderfelipe commented 6 years ago

Initially, I thought about sub-classing pybel.Molecule but eventually ended up passing the object to the template engine as is. Since now we want to add support for other parsers (such as cclib), I believe this extra layer of abstraction is unavoidable.

The point is, should we sub-class pybel.Molecule?

berquist commented 6 years ago

Originally I was going to try and have two separate configs: one that writes using pybel.Molecule attributes, and one that writes using cclib attributes. Each config would contain a "prototypical" boilerplate for that parser. The (current) ones for pybel are simple, because it has an input generator already, though not customizable. The logic to go from a ccdata to a template will have to live somewhere no matter what path is chosen. You could take every attribute present in the resulting ccdata and inject it into a pybel.Molecule, but that would be confusing compared to a subclass.

berquist commented 5 years ago

Another tricky thing is that pybel can read multiple molecules/fragments from a file, and cclib cannot yet. Do you ever use this capability?

berquist commented 5 years ago

For reference, I took all the attributes of a pybel molecule and a ccdata instance from a geometry optimization and

  1. checked the attributes unique to the ccdata instance,
  2. checked the attributes unique to the pybel molecule, and
  3. checked the attributes both had in common.
In [42]: attrs_ccdata.difference(attrs_pbmol)
Out[42]:
{'OPT_DONE',
 'OPT_NEW',
 'OPT_UNCONVERGED',
 'OPT_UNKNOWN',
 '_attributes',
 '_attrlist',
 '_dictsofarrays',
 '_intarrays',
 '_listsofarrays',
 'arrayify',
 'atomcharges',
 'atomcoords',
 'atomnos',
 'closed_shell',
 'converged_geometries',
 'coreelectrons',
 'geotargets',
 'geovalues',
 'getattributes',
 'grads',
 'homos',
 'listify',
 'metadata',
 'moenergies',
 'moments',
 'mult',
 'natom',
 'nbasis',
 'nelectrons',
 'new_geometries',
 'nmo',
 'optdone',
 'scfenergies',
 'scftargets',
 'scfvalues',
 'setattributes',
 'typecheck',
 'unconverged_geometries',
 'unknown_geometries',
 'writecml',
 'writejson',
 'writexyz'}

In [43]: attrs_pbmol.difference(attrs_ccdata)
Out[43]:
{'OBMol',
 '__iter__',
 '_cinfony',
 '_exchange',
 '_gettitle',
 '_repr_html_',
 '_repr_svg_',
 '_settitle',
 'addh',
 'atoms',
 'calccharges',
 'calcdesc',
 'calcfp',
 'clone',
 'conformers',
 'convertdbonds',
 'data',
 'dim',
 'draw',
 'energy',
 'exactmass',
 'formula',
 'localopt',
 'make3D',
 'molwt',
 'removeh',
 'residues',
 'spin',
 'sssr',
 'title',
 'unitcell'}

In [45]: attrs_pbmol.intersection(attrs_ccdata)
Out[45]:
{'__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'charge',
 'write'}

Even though the attributes are mostly orthogonal, I wonder if the solution is as simple as below. Extra logic would be needed to handle charge and mult. It isn't quite right because Molecule isn't a subclass of anything. Some of these choices might depend on if you want to read a file with both packages or just one, if you want to present a unified interface, etc.

class Molecule:

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.ccdata = None
        self.pbmol = None

    def update(self):
        attrs_ccdata = set()
        attrs_pbmol = set()
        if self.ccdata:
            attrs_ccdata = set([attr for attr in dir(self.ccdata)
                                if attr[0] != '_'])
            for attr_ccdata in attrs_ccdata:
                # set: obj, name, value
                # get: obj, name
                setattr(self, attr_ccdata,
                        getattr(self.ccdata, attr_ccdata))
        if self.pbmol:
            attrs_pbmol = set([attr for attr in dir(self.pbmol)
                               if attr[0] != '_'])
            for attr_pbmol in attrs_pbmol:
                setattr(self, attr_pbmol,
                        getattr(self.pbmol, attr_pbmol))
schneiderfelipe commented 5 years ago

Hi Eric,

I've been working on a related project where I needed to created a layer of abstraction similar to the one needed here (through the Atoms class). I made pnictogen depend on it.

This solves the cclib support by integrating cclib and pybel in a unified interface. I never wanted different templates to work with different backends, so the current approach makes me happy. Furthermore, new backends can be added in no time.

The approach relies somewhat on the cclib-to-pybel bridge, which brings some minor drawbacks:

This effectively delegates all the code around molecule manipulation and such to the pyrrole package. I'd be happy to accept contributions there as well!

schneiderfelipe commented 5 years ago

I'm closing this, feel free to reopen it if needed.