coleygroup / molpal

active learning for accelerated high-throughput virtual screening
MIT License
158 stars 36 forks source link

how to handle tautomers? #32

Closed likun1212 closed 2 years ago

likun1212 commented 2 years ago

Hi

Multiple tautomers could be generted after ligand prepartion, how should I deal with these tautomers, should I add these into pool library?

tautomers normally have differernt fingerprint.

connorcoley commented 2 years ago

Rather than having multiple tautomers as distinct library members, I imagine the best solution would be to have a canonicalization pipeline that tries to standardize representations so this isn't an issue, e.g., using RDKit's standardization with MolVS or roundtripping to/from InChI (if appropriate)

likun1212 commented 2 years ago

thanks for your reply, this is really helpful.

I am not an expert on machine learning, naively I thought it is a good thing that providing more information(tautomers) for training the surrogate model.

so this is just not the case, I think. Thank you!