How to decide on values of NK and NK0?

connorcoley / rexgen_direct

Template-free prediction of organic reaction outcomes

GNU General Public License v3.0

151 stars 69 forks source link

How to decide on values of NK and NK0? #24

Open gnsrivastava opened 3 years ago

gnsrivastava commented 3 years ago

Hello Dr Coley,

In your scripts you have used NK and NK0 as 20 and 10, respectively. NK and NK0 are used for reporting accuracies during training. NK is used to set the number of edits included in the output file during inference. I was wondering if I should keep the NK and NK0 values the same? I was hoping if you can elaborate on how you decided these values?

Gopal

PS: I have some biological reactions with total number of bond changes ~= 50 or more.

connorcoley commented 3 years ago

These values are somewhat arbitrary; changing NK (number of different bond changes considered) will improve the coverage from the first step, but will make the number of candidates after enumeration much larger.

If there are 50 or more bond changes in some of the reactions you're interested in, I'd probably suggest that this isn't the right tool. I'm not sure what reactions you're working with, but it's unlikely they are single-step with contiguous reaction centers

YH-88 commented 3 years ago

Hi. I have some chemical reactions with total number of bond changes <= 20. Should kmax be set to 20? In addition, when training the WLN model, set NK0=25 and NK=35. When testing the WLN model, set the NK values to range from 40 to 100. Is this the right way to determine the values of NK and NK0?

connorcoley commented 3 years ago

Those changes would theoretically work, but I'm afraid that the number of candidates generated after the first step will be impractically large. The combinatorial enumeration will lead to a huge number of candidates. I would suggest testing this with a very small batch size before committing to this approach

YH-88 commented 3 years ago

I got it. Thanks a lot.

gnsrivastava commented 3 years ago

Hello Dr Coley, when I am training rank diff wln using my data, I am getting following warning. warning! could not recover true smiles from gbonds: Could you tell me what "true smiles" mean?

I apologize for trivial questions.

Gopal

connorcoley commented 3 years ago

The true SMILES would be whatever is provided in the dataset as the ground truth answer, for example, in data/test.txt.proc

YH-88 commented 3 years ago

Hello Dr Coley,

I only have the SMILES of the reactants, can I directly use your trained model and input the SMILES of the reactants to predict which products will be generated? Thanks a lot.

connorcoley commented 3 years ago

Hello Dr Coley,

I only have the SMILES of the reactants, can I directly use your trained model and input the SMILES of the reactants to predict which products will be generated? Thanks a lot.

Yes, that's the intended use case. All you need to use the trained model is the reactant SMILES!

YH-88 commented 3 years ago

Thank you. But the input of the initial model is the reaction with atomic mapping， how can we do atomic mapping without products and get the reaction center？

connorcoley commented 3 years ago

You can use the fully trained model to predict outcomes by following the example at the end of rexgen_direct/rank_diff_wln/directcandranker.py

YH-88 commented 3 years ago

Thank you. I got it.