Open gnsrivastava opened 3 years ago
These values are somewhat arbitrary; changing NK (number of different bond changes considered) will improve the coverage from the first step, but will make the number of candidates after enumeration much larger.
If there are 50 or more bond changes in some of the reactions you're interested in, I'd probably suggest that this isn't the right tool. I'm not sure what reactions you're working with, but it's unlikely they are single-step with contiguous reaction centers
Hi. I have some chemical reactions with total number of bond changes <= 20. Should kmax be set to 20? In addition, when training the WLN model, set NK0=25 and NK=35. When testing the WLN model, set the NK values to range from 40 to 100. Is this the right way to determine the values of NK and NK0?
Those changes would theoretically work, but I'm afraid that the number of candidates generated after the first step will be impractically large. The combinatorial enumeration will lead to a huge number of candidates. I would suggest testing this with a very small batch size before committing to this approach
I got it. Thanks a lot.
Hello Dr Coley, when I am training rank diff wln using my data, I am getting following warning. warning! could not recover true smiles from gbonds: Could you tell me what "true smiles" mean?
I apologize for trivial questions.
Gopal
The true SMILES would be whatever is provided in the dataset as the ground truth answer, for example, in data/test.txt.proc
Hello Dr Coley,
I only have the SMILES of the reactants, can I directly use your trained model and input the SMILES of the reactants to predict which products will be generated? Thanks a lot.
Hello Dr Coley,
I only have the SMILES of the reactants, can I directly use your trained model and input the SMILES of the reactants to predict which products will be generated? Thanks a lot.
Yes, that's the intended use case. All you need to use the trained model is the reactant SMILES!
Thank you. But the input of the initial model is the reaction with atomic mapping, how can we do atomic mapping without products and get the reaction center?
You can use the fully trained model to predict outcomes by following the example at the end of rexgen_direct/rank_diff_wln/directcandranker.py
Thank you. I got it.
Hello Dr Coley,
In your scripts you have used NK and NK0 as 20 and 10, respectively. NK and NK0 are used for reporting accuracies during training. NK is used to set the number of edits included in the output file during inference. I was wondering if I should keep the NK and NK0 values the same? I was hoping if you can elaborate on how you decided these values?
Gopal
PS: I have some biological reactions with total number of bond changes ~= 50 or more.