Create OpenBabelReader to convert OpenBabel OBMol to MDAnalysis AtomGroup

lunamorrow commented 1 month ago

The first step of this OpenBabel converter will be to convert OpenBabel OBMols to MDAnalysis AtomGroups. This will enable the indirect parsing of over 100 file types into a format that MDAnalysis tools can analyse.

The OpenBabelReader will take an OBMol and correctly convert it to an AtomGroup. This Class will need to account for different attributes in OBMol objects formed from different file types, and will exploit the OpenBabel python wrappers for easy access of attributes. The resulting AtomGroup can be analysed as is, or assigned to a Topology or a Residue/Segment.

During the creation of this converter class, I will be reaching out to active OpenBabel contributors to gain advice and input about how best to develop it.

For more information and suggested implementation please see GSoC Project.

hmacdope commented 1 month ago

@lunamorrow @cbouy you have here to an AtomGroup.

What you probably want is to a Universe no? See example in RDKit reader here: https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/converters/RDKit.py#L35C1-L47C53

Direct to AtomGroup is probably not what you want.

hmacdope commented 1 month ago

Important in this as well is that RDKitReader is a subclass of MemoryReader

exs-cbouy commented 1 month ago

Just to make sure we all are on the same page in case there's any misunderstanding on the goal of the different classes that are set up for converters:

the Parser creates a topology from the "foreign" object (here an openbabel mol). The class should inherit from TopologyReaderBase and define a parse method that returns a Topology with all the atom-level and residue-level attributes. For historical reasons the attribute under which the foreign object is available is self.filename.
the Reader reads a trajectory. In the case of an openbabel mol that means parsing the coordinates from each conformer. Because OBabel is not really meant to process huge files it's fine to assume everything will fit in memory hence the use of the MemoryReader as a base class.
both Reader and Parser combined will automagically allow you to create a Universe with u = mda.Universe(obmol)
the Converter does the opposite step from the above, i.e. convert an AtomGroup or Universe to a foreign object. You can directly inherit from ConverterBase and define a convert method, which can then be automagically used with obmol = my_atomgroup.convert_to.openbabel(<optional parameters>)

Hope this helps!

lunamorrow commented 1 month ago

Ahhh ok, thanks @hmacdope and @exs-cbouy. I was planning to have the Parser make a Universe, and the Reader an AtomGroup but I see the redundancy now. What you've said makes sense @exs-cbouy, as I need to have the topology and the positions/trajectory to create a Universe. I just had a quick look at documentation and it appears that MemoryReader is for topologies with a Trajectory, while SingleFrameReaderBase is for topologies with just one position set. The only trajectory accepted by OpenBabel seems to be xtc, which MDAnalysis already takes. I assume it is best practice to inherit from MemoryReader though so that the converter can capture all possible info? I'll change that over now.

I suspect it would be best for me to start on the Parser' before theReader` too. What would you suggest @exs-cbouy, seeing as you have done it before?

exs-cbouy commented 1 month ago

I haven't used openbabel much but I'm guessing it can store coordinates for each conformer on the same molecule object (like the RDKit does), in which case the MemoryReader makes sense (since you won't always have a single set of coordinates for a given molecule).

Yes I would suggest doing the Parser before, I don't remember if you really need the Reader to start playing around and constructing a Universe from an openbabel mol, but worst case scenario you could just use dummy coordinates in the Reader to begin with.

hmacdope commented 1 month ago

To clarify this further, @lunamorrow by trajectory here we just mean "any set of coordinate data" which much be present in ANY format, not just that with more than one frame or a traditional MD format like xtc. For example, using the MemoryReader you can make a trajectory from a raw numpy array. You will conceptually at least do the same but after extracting the data from Obabel

lunamorrow commented 1 month ago

I'm guessing it can store coordinates for each conformer on the same molecule object

Yes it appears so, I will double check their API to be safe.

Yes I would suggest doing the Parser before,

Great I'll get going on that first then

To clarify this further, @lunamorrow by trajectory here we just mean "any set of coordinate data" which much be present in ANY format, not just that with more than one frame or a traditional MD format like xtc. For example, using the MemoryReader you can make a trajectory from a raw numpy array. You will conceptually at least do the same but after extracting the data from Obabel

Thanks for the clarification @hmacdope! I didn't know you could just feed in a numpy array too, that is really cool.

MDAnalysis / mda-openbabel-converter

Create OpenBabelReader to convert OpenBabel OBMol to MDAnalysis AtomGroup #5