connorcoley / rexgen_direct

Template-free prediction of organic reaction outcomes
GNU General Public License v3.0
151 stars 69 forks source link

Q: get_product_smiles highly imperfect #3

Closed amineebenamor closed 5 years ago

amineebenamor commented 5 years ago

Hello, You said in mol_graph_direct_useScores.py that: "note: get_product_smiles is HIGHLY imperfect, but that's not a huge deal. training tries to pick the right bonds. The evaluation script has a more robust function to get product_smiles." Could you explain why get_product_smiles is imperfect? I tried to find the more robust function to get_product_smiles but I didn't find it in your code. Could you tell me where it is? Thank you for your help!

connorcoley commented 5 years ago

The more robust function is edit_mol in rexgen_direct/scripts/eval_by_smiles.py. This version applies some additional pre- and post-fixes so that the graph edit representation of reactions is higher fidelity (i.e., is able to describe a larger proportion of training examples). The function that is used in mol_graph_direct_useScores.py does not contain all of these fixes and serves only as a quick check to see if the true product SMILES is already contained in the list of candidates to avoid having excessive numbers of duplicate candidates during training.