crcollins / molml

A library to interface molecules and machine learning.
MIT License
65 stars 17 forks source link

Interaction with RDKit? #5

Closed HenriqueCSJ closed 4 years ago

HenriqueCSJ commented 4 years ago

Sorry, this is more like a basic question. Can MolML Read moleculules obtained as RDKit objects (or other molecular formats like Mol2 or Mol V3000)?

crcollins commented 4 years ago

By default MolML accepts a few simple inputs tuples of elements, numbers, coordinates, connectivity or it can take filename inputs for .xyz, .out, or .mol2.

In principle, any format is acceptable for MolML as you can define arbitrary loading functions for data. A very simple example of this can be found here: https://github.com/crcollins/molml/blob/master/examples/simple.py#L85

Basically, if you give input_type a function that takes in your object and returns a LazyValues object, then you can use anything.

from molml.utils import LazyValues
from molml.features import CoulombMatrix

def read_data(obj):
    return LazyValues(elements=obj.get_elements(), coords=obj.get_coords())

objs = [obj1, obj2, obj3]
feat = CoulombMatrix(input_type=read_data)
feat.fit_transform(objs)

Hope that helps.

HenriqueCSJ commented 4 years ago

Thank you very much! I'll give it a try today.