MolecularAI / aizynthtrain

Tools to train synthesis prediction models
Apache License 2.0
21 stars 7 forks source link

regarding min_template_occurrence setting for small reaction samples #3

Closed yangxfei closed 11 months ago

yangxfei commented 1 year ago

Hello @SGenheden , I have a question for min_template_occurrence this setting. Thanks. By default, this setting min_template_occurrence is set to 3, however, for small reaction samples, after cleaning the reaction data, the group size of the same template_hash could be 1. In this case, could I set this value of min_template_occurrence=1 for small samplings? What is the main problem if this min_template_occurrence value is 1? If the group size of the same template_hash is 1, does it mean that the reaction is not popular as others reaction which group size might be above 3? Thanks.

SGenheden commented 1 year ago

The reason for using at least 3 is that you want to have samples for both training and validation. If you just have 1 sample, you would have it in either training or validation set. Furthermore, template with just 1 sample is typically rare because they were extracted from poor reaction data, e.g. from poor atom-mapping. Thus using template with more than 1 sample also reduces noise.

SGenheden commented 11 months ago

Closing due to inactivity