Closed sverhoeven closed 6 years ago
After further debugging its stuck at
inchi = AllChem.MolToInchi(x)
Where x is a mol with smiles Cn1c(CNc2ccc(C(=N)N)cc2)nc2cc(C(=O)N(CCC(=O)O)c3cccc[n+]3C3OC(C(=O)O)C(O)C(O)C3O)cc(O)c21
.
After adding more logging the break point moved to constructor of TreeNode at the AllChem.MolToInchi(mol)
of mol with smile Cn1c(CNc2ccc(C(=N)NC3OC(C(=O)O)C(O)C(O)C3O)cc2)nc2cc(C(=O)NC=CC(=O)O)ccc21
After upgrading rdkit 2016.03.2-np111py27_1 rdkit --> 2017.09.2.0-py27h080088d_1 rdkit, it gets stuck at:
2018-01-19 12:39:31,443 - root - DEBUG - Node Cn1c(CO)nc2cc(C(=O)Nc3ccccn3)ccc21 metabolized
2018-01-19 12:39:31,444 - root - DEBUG - Applying rule'N-glucuronidation_(aromatic_=n-)' on Cn1c(CO)nc2cc(C(=O)Nc3ccccn3)ccc21
2018-01-19 12:39:31,445 - root - DEBUG - Products made
2018-01-19 12:39:31,447 - root - DEBUG - Product Cn1c(CO)nc2cc(C(=O)Nc3cccc[n+]3C3OC(C(=O)O)C(O)C(O)C3O)ccc21
I did some further testing and it is really something in knime that causes the problem. The following script (using the one metabolite that causes the problem) runs perfectly outside of knime, but if you put it in a knime python script node (ignoring the input table) it doesn't finish.
import sygma
from rdkit import Chem
scenario = sygma.Scenario([
[sygma.ruleset['phase1'], 2],
[sygma.ruleset['phase2'], 1]])
parent = Chem.MolFromSmiles("Cn1c(CNc2ccc(cc2)C(=N)N)nc3cc(ccc13)C(=O)N(CCC(=O)O)c4ccccn4")
metabolic_tree = scenario.run(parent)
metabolic_tree.calc_scores()
print metabolic_tree.to_smiles()
I played also used the logging now, but writing out the number of total tree nodes after each application of a rule on a molecule (somewhere in tree.py). If I run the above script inside a knime node it gets stuck at 839 metabolites in the tree. With just 2 steps of phase 1 it finishes successfully with only 152 metabolites in the tree. If I run with 3 steps of phase 1 and no phase 2, it gets stuck again at 692 metabolites in the tree. So, to me it seems that the problem is related to the amount of memory available to python inside a knime node.
One more test: with 1 x phase1 + 1 x phase2 the script finishes correctly, 148 metabolites. With 1 x phase1 and 2 x phase2 is gets stuck again at 519 metabolites.
When you cancel the execution of the node the log starts printing again. The logs looks the same as when a different molecule is used without getting stuck. To me it looks like the KNIME-Python integration is blocking the execution because some buffer is not being emptied.
A fresh installation of KNIME 3.5.1 with the following plugins installed:
A miniconda2 installation using
git clone git@github.com:3D-e-Chem/sygma.git
cd sygma
conda env create -f /dev/null -n sygma
source activate sygma
conda install -c rdkit rdkit
python setup.py develop
pip install protobuf
which python
In KNIME preference set the Python path to the conda sygma Python.
The SyGMa Metabolotites node show 30% progress, but then freezes. I see a Python process in top a couple of seconds, but then it is gone. The KNIME console remains empty, after cancelling the node a lot of RDKit warnings are printed.
Using a Python Script (1->1) node and the following Python code:
import sygma
import pandas as pd
import logging
logging.basicConfig(level=logging.DEBUG,
filename='/tmp/sygma.log',
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger('knime4sygma')
scenario = sygma.Scenario([
[sygma.ruleset['phase1'], 2],
[sygma.ruleset['phase2'], 1]
])
parents_column_name = 'RDKit molecule'
parents = input_table[parents_column_name]
metabolites = []
for parent in parents:
logger.warning('run start\n')
metabolic_tree = scenario.run(parent)
logger.warning('run done\n')
metabolic_tree.calc_scores()
metabolites += metabolic_tree.to_list(parent_column=parents_column_name)
metabolites_df = pd.DataFrame(metabolites)
output_table = pd.merge(input_table,
metabolites_df,
on=parents_column_name)
The node runs successfully, printing RDKit warnings to the KNIME console the whole time. The output table has 1257 rows.
An old KNIME installation version 3.3.1 with the following plugins:
Both the Sygma node and Python node show the same behavior of freezing.
It seems the org.knime.python plugin is misbehaving, while the org.knime.python2 plugin works.
I will upgrade the https://github.com/3D-e-Chem/knime-python-wrapper to use the org.knime.python2 plugin and afterwards make this plugin depend on it.
On https://3d-e-chem.github.io/updates there is Sygma node v1.2.0 available please test with KNIME >=3.5
Problem indeed solved. Thanks!
i am running t-sne using python script in KNIME but getting stuck at 30%
need urgent help I think I am using pyout to get KNIME workflow
If you are already using the latest KNIME and KNIME-python extension then I don't know what more to do. I would raise the problem at https://forum.knime.com/c/knime-analytics-platform,
Replacing sygma node with Python node:
And adding more logging to sygma itself, running from KNIME gets stuck at tree.py:metabolizenode() with `rule'N-glucuronidation(aromatic_=n-)'
on
Cn1c(CNc2ccc(C(=N)N)cc2)nc2cc(C(=O)N(CCC(=O)O)c3ccccn3)cc(O)c21`