Xundrug / MolGpKa

The graph-convolutional neural network for pka prediction
MIT License
61 stars 25 forks source link

Atom index reports hydrogen atom instead of reaction center for acids #10

Open lkshiel opened 10 months ago

lkshiel commented 10 months ago

Hi, I have been enjoying using this tool locally to predict pKas, however I noticed that the acid pKa atom index is off. For instance, rather than reporting the atom index for the oxygen atom (10), it reports the atom index for the hydrogen (19) attached to the oxygen. (See below). Note, I have only noticed this uses for acid pKas. 11-17-2023 11-52-35 AM 11-17-2023 11-53-54 AM

Upon further investigation, I believe this is happening because the match_acid() function returns the item from the matches list based on the value specified in the ‘Index’ column of the ‘smarts_pattern.txv’ document. See line 44 of ‘ionization_group.py’. The values in the ‘Index’ column appear to be off by one, causing the wrong atom index to be returned. This is likely because python lists have a 0-start index rather than a 1-start index, and the value in the ‘Index’ column need to be fixed to reflect that. I have tried to show this in the screen grab below by returning the list of atom indices in matches, the ‘Index’ value reported in the smarts_pattern document, and the item extracted from matches based on the specified value from ‘Index’, to demonstrate what I am talking about. 11-17-2023 12-18-43 PM 11-17-2023 12-05-02 PM

Thanks! Lindsay

xiaolinpan commented 10 months ago

Hi lkshiel, thank you for your comment. For acid, we marked hydrogen of acidic center in the graph representation to predict the pKa. You can get the oxygen atom index by the hydrogen. RDKit provide a function to find neighbor atoms: atom.GetNeighbors().