Xundrug / MolGpKa

The graph-convolutional neural network for pka prediction
MIT License
61 stars 25 forks source link

Missing tetrazole acidic nitrogen SMARTS pattern #2

Open rubbs14 opened 2 years ago

rubbs14 commented 2 years ago

Hi, first of all, congrats on a great job! I forked the repo and I was playing around with some test sets. While testing some N heterocycles, I noticed that the acidic nitrogen in tetrazole was ignored by the prediction function. I added the SMARTS definition for the acidic nitrogen to the tsv file and uploaded it to my fork. The definition is: [nH&!$(n@[cR2])] This will only match the aromatic hydrogen-bearing nitrogen in tetrazoles. When I added it, the model predicts a pKa of 7.1 for 1H- and 2H-tetrazole, which is not correct. image I think it would be interesting to retrain the model including also this definition.

pykao commented 2 years ago

Hi @rubbs14, I think you should retrain the MolGpKa model again.