connorcoley / rexgen_direct

Template-free prediction of organic reaction outcomes
GNU General Public License v3.0
150 stars 68 forks source link

Error when trying to train in core-wln-global #23

Closed gnsrivastava closed 3 years ago

gnsrivastava commented 3 years ago

I am trying to retrain the model using core-wln-global at default parameters but I am getting the error that I don't understand. Can you please help me with this?

Thanking you Gopal

Exception in thread Thread-1: Traceback (most recent call last): File "/home/keras/anaconda3/envs/py3/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/keras/anaconda3/envs/py3/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "nntrain_direct.py", line 181, in read_data cur_bin, cur_label, sp_label = get_all_batch(zip(src_batch, edit_batch)) File "/home/keras/work/AMR/5.RC_CLASS/KEGG_2021/EC1_REACTION/REACTION/core_wln_global/ioutils_direct.py", line 88, in get_all_batch l, sl = get_bond_label(r,e,max_natoms) File "/home/keras/work/AMR/5.RC_CLASS/KEGG_2021/EC1_REACTION/REACTION/core_wln_global/ioutils_direct.py", line 60, in get_bond_label rmap[x,y,z] = rmap[y,x,z] = 1 IndexError: index 6 is out of bounds for axis 1 with size 6

connorcoley commented 3 years ago

Are you using the same training dataset as in the repo or a custom one? Can you find the reaction SMILES and label of the example that is giving this error?

gnsrivastava commented 3 years ago

Thanks for the reply. I am using a custom dataset. The dataset is consisting of biochemical reactions from KEGG. What can I do to solve this problem?

Gopal

On Tue, Feb 2, 2021 at 7:30 PM Connor Coley notifications@github.com wrote:

Are you using the same training dataset as in the repo or a custom one?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/connorcoley/rexgen_direct/issues/23#issuecomment-771653870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AREBHS7XVBT5THOV2TQNEC3S5AARDANCNFSM4W2KQLZQ .

connorcoley commented 3 years ago

There might be an issue with incomplete atom mapping in those KEGG reactions. Can you share an example?

gnsrivastava commented 3 years ago

Sure . I am putting some examples with and without atom mapping below. Without mapping: ----> With Mapping: ------> Bond details

  1. OC(=O)C(O)=O.O=O>>OO.O=C=O ----> [O:1]=[C:2]([OH:3])[C:6](=[O:7])[OH:8].[O:4]=[O:5]>>[O:1]=[C:2]=[O:3].[OH:4][OH:5] -------> [O:1]=[C:2]([OH:3])[C:6](=[O:7])[OH:8].[O:4]=[O:5]>>[O:1]=[C:2]=[O:3].[OH:4][OH:5] 2-3-2.0;2-6-0.0;4-5-1.0
  2. OC(=O)C(=O)C(CC(O)=O)C(O)=O>>OC(=O)CCC(=O)C(O)=O.O=C=O -------> [O:1]=[C:2]([OH:3])[C:4](=[O:5])[CH:6]([C:7](=[O:8])[OH:9])[CH2:10][C:11](=[O:12])[OH:13]>>[O:1]=[C:2]([OH:3])[C:4](=[O:5])[CH2:6][CH2:10][C:11](=[O:12])[OH:13].[C:7](=[O:8])=[O:9] -------> [O:1]=[C:2]([OH:3])[C:4](=[O:5])[CH:6]([C:7](=[O:8])[OH:9])[CH2:10][C:11](=[O:12])[OH:13]>>[O:1]=[C:2]([OH:3])[C:4](=[O:5])[CH2:6][CH2:10][C:11](=[O:12])[OH:13].[C:7](=[O:8])=[O:9] 7-9-2.0;6-7-0.0 I used prep_data.py to get bond information as well. But when I put the data for training, I am getting error.

Gopal

PS: I used RDT tool for the atom atom mapping.

connorcoley commented 3 years ago

Is this the reaction that's providing the error when you process it? I don't see any problems with these reaction SMILES that would be likely to cause an error

gnsrivastava commented 3 years ago

I took the idea from your last comment and split my data into small parts. When I took top 3 reactions, the training process started working but after I added the last 2 reactions (given at the end) it gave the same error again.

It is working for these reactions below... [OH:1][OH:2]>>[O:1]=[O:2].[OH2:3] 1-2-2.0

[OH:1][OH:3].[Mn+2:2].[H+:4]>>[Mn+3:2].[OH2:1] 1-3-0.0

[O:1]=[P:2]([OH:3])([OH:4])[O:5][P:6](=[O:7])([OH:8])[O:9][CH2:10][CH2:11][C:12]=1[S:13][CH:14]=[N+:15]([C:16]1[CH3:17])[CH2:18][C:19]2=[CH:20][N:21]=[C:22]([N:23]=[C:24]2[NH2:25])[CH3:26].[O:27]=[C:28]([OH:29])[C:30](=[O:31])[CH3:32]>>[O:1]=[P:2]([OH:3])([OH:4])[O:5][P:6](=[O:7])([OH:8])[O:9][CH2:10][CH2:11][C:12]=1[S:13][C:14](=[N+:15]([C:16]1[CH3:17])[CH2:18][C:19]2=[CH:20][N:21]=[C:22]([N:23]=[C:24]2[NH2:25])[CH3:26])[CH:30]([OH:31])[CH3:32].[O:27]=[C:28]=[O:29] 28-29-2.0;14-30-1.0;28-30-0.0;30-31-1.0

But when I add [O:1]=[C:2]([NH2:3])[C:4]1=[CH:5][CH:6]=[CH:7][N+:8](=[CH:9]1)[CH:10]2[O:11][CH:12]([CH2:13][O:14][P:15](=[O:16])([OH:17])[O:18][P:19](=[O:20])([OH:21])[O:22][CH2:23][CH:24]3[O:25][CH:26]([N:27]4[CH:28]=[N:29][C:30]5=[C:31]([N:32]=[CH:33][N:34]=[C:35]45)[NH2:36])[CH:37]([OH:38])[CH:39]3[OH:40])[CH:41]([OH:42])[CH:43]2[OH:44].[OH:45][NH2:46]>>[O:1]=[C:2]([NH2:3])[C:4]=1[CH2:5][CH:6]=[CH:7][N:8]([CH:9]1)[CH:10]2[O:11][CH:12]([CH2:13][O:14][P:15](=[O:16])([OH:17])[O:18][P:19](=[O:20])([OH:21])[O:22][CH2:23][CH:24]3[O:25][CH:26]([N:27]4[CH:28]=[N:29][C:30]5=[C:31]([N:32]=[CH:33][N:34]=[C:35]45)[NH2:36])[CH:37]([OH:38])[CH:39]3[OH:40])[CH:41]([OH:42])[CH:43]2[OH:44].[OH:45][N:46]=[N:47][OH:48].[H+:49] 46-47-2.0;4-5-1.0;6-7-2.0;47-48-1.0;5-6-1.0;4-9-2.0;7-8-1.0;8-9-1.0 and [O:1]=[CH2:2].[OH2:3]>>[O:1]=[CH:2][OH:3].[OH:4][CH3:5] 2-3-1.0;4-5-1.0 It started giving the same error.

gnsrivastava commented 3 years ago

Thank you Dr Coley. I manually removed problematic reactions. Now its working. But I would still like to know if there is a way to deal with these problematic reactions.

Thank you Gopal

connorcoley commented 3 years ago

I think I do know why these are problematic now. This approach is designed to work on reactions where heavy atoms must come from reactant molecules; it looks like in a few of your example reactions, there are atoms in the product(s) that do not appear in the reactants. In your first example, the atom with map number 3 is new; in the fourth example, you have atoms 47 and 48 that are new; in the fifth example, atoms 4 and 5 are new. The preprocessing script identifies these as new bonds to predict. For example, in the last example, it detects that there should be a new single bond between atoms 4 and 5. When it tries to mark these on the reactants, it encounters an error since those atoms aren't present. The first example, even though you have atom 3 in the product and not in the reactants, runs without an error because there are no "new bonds" as perceived by the preprocessing script.

gnsrivastava commented 3 years ago

Thank you for the explanation. It was helpful. When I was processing to get SMILES from KEGG reactions. I might have left out stoichiometry of the reaction.

Thank you again. Gopal