MolecularAI / reaction_utils

Utilities for working with datasets of chemical reactions, reaction templates and template extraction.
https://molecularai.github.io/reaction_utils/
Apache License 2.0
63 stars 11 forks source link

why just check these four atom properties? #14

Closed queliyong closed 9 months ago

queliyong commented 9 months ago

Hi, When I read the code in rxnutils/chem/template.py, I am confused that just these four properties, AROMATIC, CHARGE ,HYDROGEN, and DEGREE are corrected when computing template fingerprints, and what if not to correct these properties. And I remember it is not that complex to compute reaction fingerprints compared with template fingerprints. Thank u very much.

SGenheden commented 9 months ago

Hello, Let me try to explain the background to this code and perhaps that clears up some of the confusion: There is code in RDKit to compute fingerprints for reactions objects, but these does not work for reactions created from RDChiral-derived SMARTS patterns. The reason is that the atom specifications in these SMARTS patterns are a concatenation of several atomic properties: element, aromaticity, degree, charge etc and only the first of this specification, i.e. the element is used to create the QueryAtom object that makes up the molecules in the reaction and therefore contribute to the built-in fingerprint calculations. Therefore, we set out to create a code that could account for all the atomic information in the SMARTS pattern. The ECFP fingerprint starts from atomic invariant properties and that are these we are trying to extract and correct from the QueryAtom object and the SMARTS pattern. I hope this explanation make sense.

queliyong commented 9 months ago

Hello, Let me try to explain the background to this code and perhaps that clears up some of the confusion: There is code in RDKit to compute fingerprints for reactions objects, but these does not work for reactions created from RDChiral-derived SMARTS patterns. The reason is that the atom specifications in these SMARTS patterns are a concatenation of several atomic properties: element, aromaticity, degree, charge etc and only the first of this specification, i.e. the element is used to create the QueryAtom object that makes up the molecules in the reaction and therefore contribute to the built-in fingerprint calculations. Therefore, we set out to create a code that could account for all the atomic information in the SMARTS pattern. The ECFP fingerprint starts from atomic invariant properties and that are these we are trying to extract and correct from the QueryAtom object and the SMARTS pattern. I hope this explanation make sense.

get it. Thanks a lot.