Hi, first of all, congrats on a great job!
I forked the repo and I was playing around with some test sets. While testing some N heterocycles, I noticed that the acidic nitrogen in tetrazole was ignored by the prediction function.
I added the SMARTS definition for the acidic nitrogen to the tsv file and uploaded it to my fork. The definition is:
[nH&!$(n@[cR2])]
This will only match the aromatic hydrogen-bearing nitrogen in tetrazoles.
When I added it, the model predicts a pKa of 7.1 for 1H- and 2H-tetrazole, which is not correct.
I think it would be interesting to retrain the model including also this definition.
Hi, first of all, congrats on a great job! I forked the repo and I was playing around with some test sets. While testing some N heterocycles, I noticed that the acidic nitrogen in tetrazole was ignored by the prediction function. I added the SMARTS definition for the acidic nitrogen to the tsv file and uploaded it to my fork. The definition is:
[nH&!$(n@[cR2])]
This will only match the aromatic hydrogen-bearing nitrogen in tetrazoles. When I added it, the model predicts a pKa of 7.1 for 1H- and 2H-tetrazole, which is not correct. I think it would be interesting to retrain the model including also this definition.