ISISNeutronMuon / MDANSE

MDANSE: Molecular Dynamics Analysis for Neutron Scattering Experiments
https://www.isis.stfc.ac.uk/Pages/MDANSEproject.aspx
GNU General Public License v3.0
20 stars 4 forks source link

[ENHANCEMENT] Molecule bonding determination and labelling using rdkit #350

Closed ChiCheng45 closed 1 month ago

ChiCheng45 commented 5 months ago

Is your feature request related to a problem? Please describe. Currently, MDANSE only determines whether something is an molecule or bulk structures based on the input file of the converter e.g. when a PDB file is used. It would be useful to determine whether an atom belongs to a molecule or bulk structure during the conversion process. We could then improve things like the atom selection and the 3D viewer.

Describe the solution you'd like Add some molecule determination to the converter process for input files which do not have bonding information. There is some new rdkit functions that we can use. See https://github.com/rdkit/rdkit/discussions/5759 and https://github.com/rdkit/UGM_2022/blob/main/Notebooks/Landrum_WhatsNew.ipynb. We could label each molecule by its canonical smiles and group them together for atoms selection and other purposes.

Additional context Here is some example code I created from the examples in the links above.

from rdkit import Chem

raw_mol = Chem.MolFromXYZBlock(
"""19

C    0.0645055554    1.4843171326    0.3723315122
C   -0.001915467     0.0516984051   -0.1729357038
C   -1.4001624807   -0.5304376801   -0.1487075838
C   -1.9909553844   -1.0379429662    1.1601625215
C   -1.7182271444   -1.9962888369    0.0152208454
C   -2.905416575    -2.2627721735   -0.8489696604
C   -3.3347497536   -1.1525477782   -1.4661914556
C   -2.4353480328   -0.0114763902   -1.1388331005
O   -2.4853518918    1.1093228549   -1.595794586
H    1.0876615149    1.8699729825    0.3263064541
H   -0.5836533924    2.142053447    -0.2101812431
H   -0.2626541721    1.5210712578    1.4170099662
H    0.3710335675    0.0343653078   -1.2041622283
H    0.662791197    -0.6008609283    0.4071642198
H   -1.3142481944   -1.0584396879    2.0096409673
H   -3.0252533013   -0.8199288984    1.4038330438
H   -0.9412329872   -2.7436632889    0.1278385726
H   -3.37329736 -3.2394582332   -0.9072563749
H   -4.2004762777   -1.0463782959   -2.1061677066"""
)

from rdkit.Chem import rdDetermineBonds
bond_moll = Chem.Mol(raw_mol)
rdDetermineBonds.DetermineBonds(bond_moll,charge=0)
print(Chem.CanonSmiles(Chem.MolToSmiles(bond_moll)))

Which gives the output. CC[C@]12C[C@H]1C=CC2=O

ChiCheng45 commented 5 months ago

A solution to #296?

MBartkowiakSTFC commented 5 months ago

This looks better than the solution I tried last time. Should work

MBartkowiakSTFC commented 1 month ago

This has been implemented in MoleculeFinder in #392