Closed gnsrivastava closed 3 years ago
Are you using the same training dataset as in the repo or a custom one? Can you find the reaction SMILES and label of the example that is giving this error?
Thanks for the reply. I am using a custom dataset. The dataset is consisting of biochemical reactions from KEGG. What can I do to solve this problem?
Gopal
On Tue, Feb 2, 2021 at 7:30 PM Connor Coley notifications@github.com wrote:
Are you using the same training dataset as in the repo or a custom one?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/connorcoley/rexgen_direct/issues/23#issuecomment-771653870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AREBHS7XVBT5THOV2TQNEC3S5AARDANCNFSM4W2KQLZQ .
There might be an issue with incomplete atom mapping in those KEGG reactions. Can you share an example?
Sure . I am putting some examples with and without atom mapping below. Without mapping: ----> With Mapping: ------> Bond details
OC(=O)C(O)=O.O=O>>OO.O=C=O ----> [O:1]=[C:2]([OH:3])[C:6](=[O:7])[OH:8].[O:4]=[O:5]>>[O:1]=[C:2]=[O:3].[OH:4][OH:5] -------> [O:1]=[C:2]([OH:3])[C:6](=[O:7])[OH:8].[O:4]=[O:5]>>[O:1]=[C:2]=[O:3].[OH:4][OH:5] 2-3-2.0;2-6-0.0;4-5-1.0
OC(=O)C(=O)C(CC(O)=O)C(O)=O>>OC(=O)CCC(=O)C(O)=O.O=C=O -------> [O:1]=[C:2]([OH:3])[C:4](=[O:5])[CH:6]([C:7](=[O:8])[OH:9])[CH2:10][C:11](=[O:12])[OH:13]>>[O:1]=[C:2]([OH:3])[C:4](=[O:5])[CH2:6][CH2:10][C:11](=[O:12])[OH:13].[C:7](=[O:8])=[O:9] -------> [O:1]=[C:2]([OH:3])[C:4](=[O:5])[CH:6]([C:7](=[O:8])[OH:9])[CH2:10][C:11](=[O:12])[OH:13]>>[O:1]=[C:2]([OH:3])[C:4](=[O:5])[CH2:6][CH2:10][C:11](=[O:12])[OH:13].[C:7](=[O:8])=[O:9] 7-9-2.0;6-7-0.0
I used prep_data.py to get bond information as well. But when I put the data for training, I am getting error.Gopal
PS: I used RDT tool for the atom atom mapping.
Is this the reaction that's providing the error when you process it? I don't see any problems with these reaction SMILES that would be likely to cause an error
I took the idea from your last comment and split my data into small parts. When I took top 3 reactions, the training process started working but after I added the last 2 reactions (given at the end) it gave the same error again.
It is working for these reactions below...
[OH:1][OH:2]>>[O:1]=[O:2].[OH2:3] 1-2-2.0
[OH:1][OH:3].[Mn+2:2].[H+:4]>>[Mn+3:2].[OH2:1] 1-3-0.0
[O:1]=[P:2]([OH:3])([OH:4])[O:5][P:6](=[O:7])([OH:8])[O:9][CH2:10][CH2:11][C:12]=1[S:13][CH:14]=[N+:15]([C:16]1[CH3:17])[CH2:18][C:19]2=[CH:20][N:21]=[C:22]([N:23]=[C:24]2[NH2:25])[CH3:26].[O:27]=[C:28]([OH:29])[C:30](=[O:31])[CH3:32]>>[O:1]=[P:2]([OH:3])([OH:4])[O:5][P:6](=[O:7])([OH:8])[O:9][CH2:10][CH2:11][C:12]=1[S:13][C:14](=[N+:15]([C:16]1[CH3:17])[CH2:18][C:19]2=[CH:20][N:21]=[C:22]([N:23]=[C:24]2[NH2:25])[CH3:26])[CH:30]([OH:31])[CH3:32].[O:27]=[C:28]=[O:29] 28-29-2.0;14-30-1.0;28-30-0.0;30-31-1.0
But when I add
[O:1]=[C:2]([NH2:3])[C:4]1=[CH:5][CH:6]=[CH:7][N+:8](=[CH:9]1)[CH:10]2[O:11][CH:12]([CH2:13][O:14][P:15](=[O:16])([OH:17])[O:18][P:19](=[O:20])([OH:21])[O:22][CH2:23][CH:24]3[O:25][CH:26]([N:27]4[CH:28]=[N:29][C:30]5=[C:31]([N:32]=[CH:33][N:34]=[C:35]45)[NH2:36])[CH:37]([OH:38])[CH:39]3[OH:40])[CH:41]([OH:42])[CH:43]2[OH:44].[OH:45][NH2:46]>>[O:1]=[C:2]([NH2:3])[C:4]=1[CH2:5][CH:6]=[CH:7][N:8]([CH:9]1)[CH:10]2[O:11][CH:12]([CH2:13][O:14][P:15](=[O:16])([OH:17])[O:18][P:19](=[O:20])([OH:21])[O:22][CH2:23][CH:24]3[O:25][CH:26]([N:27]4[CH:28]=[N:29][C:30]5=[C:31]([N:32]=[CH:33][N:34]=[C:35]45)[NH2:36])[CH:37]([OH:38])[CH:39]3[OH:40])[CH:41]([OH:42])[CH:43]2[OH:44].[OH:45][N:46]=[N:47][OH:48].[H+:49] 46-47-2.0;4-5-1.0;6-7-2.0;47-48-1.0;5-6-1.0;4-9-2.0;7-8-1.0;8-9-1.0
and
[O:1]=[CH2:2].[OH2:3]>>[O:1]=[CH:2][OH:3].[OH:4][CH3:5] 2-3-1.0;4-5-1.0
It started giving the same error.
Thank you Dr Coley. I manually removed problematic reactions. Now its working. But I would still like to know if there is a way to deal with these problematic reactions.
Thank you Gopal
I think I do know why these are problematic now. This approach is designed to work on reactions where heavy atoms must come from reactant molecules; it looks like in a few of your example reactions, there are atoms in the product(s) that do not appear in the reactants. In your first example, the atom with map number 3 is new; in the fourth example, you have atoms 47 and 48 that are new; in the fifth example, atoms 4 and 5 are new. The preprocessing script identifies these as new bonds to predict. For example, in the last example, it detects that there should be a new single bond between atoms 4 and 5. When it tries to mark these on the reactants, it encounters an error since those atoms aren't present. The first example, even though you have atom 3 in the product and not in the reactants, runs without an error because there are no "new bonds" as perceived by the preprocessing script.
Thank you for the explanation. It was helpful. When I was processing to get SMILES from KEGG reactions. I might have left out stoichiometry of the reaction.
Thank you again. Gopal
I am trying to retrain the model using core-wln-global at default parameters but I am getting the error that I don't understand. Can you please help me with this?
Thanking you Gopal
Exception in thread Thread-1: Traceback (most recent call last): File "/home/keras/anaconda3/envs/py3/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/home/keras/anaconda3/envs/py3/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "nntrain_direct.py", line 181, in read_data cur_bin, cur_label, sp_label = get_all_batch(zip(src_batch, edit_batch)) File "/home/keras/work/AMR/5.RC_CLASS/KEGG_2021/EC1_REACTION/REACTION/core_wln_global/ioutils_direct.py", line 88, in get_all_batch l, sl = get_bond_label(r,e,max_natoms) File "/home/keras/work/AMR/5.RC_CLASS/KEGG_2021/EC1_REACTION/REACTION/core_wln_global/ioutils_direct.py", line 60, in get_bond_label rmap[x,y,z] = rmap[y,x,z] = 1 IndexError: index 6 is out of bounds for axis 1 with size 6