MolecularAI / aizynthfinder

A tool for retrosynthetic planning
https://molecularai.github.io/aizynthfinder/
MIT License
562 stars 128 forks source link

Incorrect labels in USPTO templates data #133

Closed m-mokaya closed 9 months ago

m-mokaya commented 11 months ago

Hello,

I have noticed the USPTO template library downloaded using the 'download_public_data' command does not have any correct template classification data. Instead, all the rows are '0.0 Unrecognised'.

Further to this, when you try and pull the correct data from previous .hdf5 files template files, the template_hash, template_code or retro_template entries do not match. Is this intentional? Thanks.

SGenheden commented 11 months ago

For legal reasons we were unable to provide NextMove classes for the latest USPTO template-based model. We wanted to create a reproducible dataset and model with only open-source tools. And NextMove classification falls outside this realm.

For the second question, the old and new template models are not comparable. They were extracted using slightly different methods and the template hashes are computed in completely different fashions. So you cannot pull the classification from the old model into the new one.

SGenheden commented 9 months ago

Closing due to inactivity