MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.31k stars 648 forks source link

rdkit interoperability #2468

Closed richardjgowers closed 1 year ago

richardjgowers commented 4 years ago

This is mostly an idea for a GSOC project. There's a bunch of cool stuff in rdkit which isn't even close to being in MDA (and vice versa) so rather than reinvent wheels, it would be cool to do:

import MDAnalysis as mda
from MDAnalysis import convert_to

u = mda.Universe('bleh')

rdkit_moi = convert_to('rdkit', u)

and

from rdkit import Chem
from MDAnalysis import convert_to

rdkit_mol = Chem.MolFromSmiles('CCOC')

u = convert_to('mda', rdkit_mol)

Which would expand upon the converters idea that @lilyminium has got rolling with parmed.

The data structures are going to be very different, and rdkit is quite picky about what it lets you load, but it would be cool to get something going.

RMeli commented 4 years ago

This would be very useful indeed! The following blog post might be relevant for further discussion: Why the RDKit isn't available on PyPi.

richardjgowers commented 4 years ago

Yeah rdkit via conda is the only way to stay sane tbh. So this project might end up in a side package/non default submodule which is optionally installed and does some monkey patching to _CONVERTERS.

IAlibay commented 4 years ago

At least on our end, rdkit integration would be very much appreciated. It would definitely make tools that depend on both packages like lintools (which we still have aims to revive properly) a lot easier.

orbeckst commented 4 years ago

I like the idea of expanding on converters – working towards API interoperability instead of files. It's the future.

We might have to review our policy on package dependencies in the core. Maybe it's ok for the majority of users to get a well-define failure if they try to do something with a specialized reader/converter, especially if they are told what to do if they want to install the package. I am thinking along mocking missing optional packages in such a way that only users who want to use exactly the functionality get a failure.

Perhaps the policy can be changed as to "packages that are used in a single convertor or reader can be optional". Or we make converters a submodule with its own policy (but it would be convenient to also be able to include readers/parsers with exotic dependencies).

orbeckst commented 4 years ago

Do you know anyone from the RDKit community who might want to co-mentor for GSoC? That would be extremely valuable.

RMeli commented 4 years ago

RDKit usually takes part to GSoC under the OpenChemistry organization. You can have a look at the RDKit Project Ideas for past mentors. They have a Slack channel and @greglandrum is on it. Maybe worth a try?

richardjgowers commented 4 years ago

I use rdkit enough at work that I can find my way around it, but if Greg wants to give his blessing that’s fine too :)

On Fri, Jan 24, 2020 at 22:28, Rocco Meli notifications@github.com wrote:

RDKit usually takes part to GSoC under the OpenChemistry https://www.openchemistry.org/ organization. You can have a look at the RDKit Project Ideas http://wiki.openchemistry.org/GSoC_Ideas_2019#RDKit_Project_Ideas for past mentors. They have a Slack channel http://openchemistry.slack.com and @greglandrum https://github.com/greglandrum is on it. Maybe worth a try?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/mdanalysis/issues/2468?email_source=notifications&email_token=ACGSGBZSQAHVED7VBFFYT4TQ7NTRPA5CNFSM4KLELL7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ4JFXY#issuecomment-578327263, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB7BJ7SVBGFN67YFYZDQ7NTRPANCNFSM4KLELL7A .

greglandrum commented 4 years ago

This would be cool! I'd be happy to try and help with answering RDKit-related questions. It's hard to commit to co-mentoring though, until we see what projects actually end up being funded I won't know how busy I am.

haroldgrosjean commented 4 years ago

Hello,

I am investigating protein-ligand interaction with MD simulations and the idea of combining RDkit with MDAnalysis is extremely exciting.

Currently, I am using ODDT to do that (https://oddt.readthedocs.io/en/latest/rst/oddt.html#module-oddt.interactions). The problem with this approach is that ODDT requires PDB files (which are then transformed into RDkit/pybel objects) meaning that one has to convert each frame into a PDB before moving into processing which is slow and bad practice. The idea behind it is that you provide the receptor and the ligand and it returns a list of descriptors such as donor atom, acceptor atom, atom types, etc.

One could feed the trajectory into the new RDKit wrapper and use ODTT in the loop to get the data. This implies that the wrapper must be able to take in the protein atoms and transform them appropriately into an RDKit object. Is that something that is envisaged because it would be of great use to a lot of people? Such functionality would also allow us to hardcode more complex rules and study/ measure less popular/ frequent interactions that are not captured by ODDT such as anion-pi interactions but are also important in protein-ligand binding. It would also combine extremely well with Native contacts analysis https://www.mdanalysis.org/docs/documentation_pages/analysis/contacts.html.

Many thanks,

Harold

j-wags commented 4 years ago

Thanks for doing this work! This is really exciting from Open Force Field's perspective. We've also been struggling to perceive bond orders from elements+connectivity (and to know when we can or can't safely guess), so the careful writeups and test cases in this project have been particularly cool to see :-)

orbeckst commented 4 years ago

@haroldgrosjean please have a look at https://www.mdanalysis.org/2020/08/29/gsoc-report-cbouy/#demo — it sounds to me that this will cover your use case. Note that not everything is working yet because not all of @cbouy 's PRs are merged yet but it will come.

@cbouy might be able to say more — the MDA/RDKit project has really made big leaps since you posted your comment in July (sorry for the long silence).

EDIT: Also have a look at @cbouy 's blog, especially https://cedric.bouysset.net/blog/2020/08/07/rdkit-interoperability

orbeckst commented 3 years ago

Hi Harold,

I think this would be a good question for the mailing list.

I’m not sure if SMARTS selections are already in develop. In the mean time you could check the .elements attribute of the atomgroup, which should exist when you have a rdkit molecule.

Oliver

Am 12.11.2020 um 09:30 schrieb Harold Grosjean notifications@github.com:

 Hello,

I have started to use the RDKit wrapper to code my contact analysis. I was wondering if there is any way to check whether a given atom matches a smart pattern?

for example:

for atom_1 in group_1: for atom_2 in group_2: distance = distances.distance_array(atom_1.position, atom_2.position) if distance <= contact_max_dist: if atom_1 is 'smarts [F,Cl,Br,I]': #pseudocode do something #pseudocode This would considerably speed-up my code and I am sure it would be of use to other people.

Many thanks in advance.

Harold

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

cbouy commented 3 years ago

I was going to answer but I'll wait for the mailing list thread so that more people can find the answer 😃 And yes, it's possible with some tweaks to your code example.

cbouy commented 3 years ago

I'm not authorized to answer the discussion :( @orbeckst Should I post my answer here and you repost it on the thread, or can you authorize me to reply on the mailing list ?

orbeckst commented 3 years ago

Normally, you need to subscribe to the mailing list and when you reply the first time, an admin will remove the hold on your subscription (to make sure you aren't a spammer). However, I added you directly to https://groups.google.com/g/mdnalysis-discussion with your b...@gmail address. Please try again.

Jay-sanjay commented 1 year ago

is the issue still open