3D-e-Chem / knime-sygma

KNIME nodes for Sygma
GNU General Public License v3.0
0 stars 0 forks source link

2 phase1 and 1 phase1 does not work #13

Closed sverhoeven closed 6 years ago

sverhoeven commented 6 years ago

Replacing sygma node with Python node:

import sygma
import pandas as pd
import logging

logging.basicConfig(level=logging.DEBUG, 
                    filename='/tmp/sygma.log', 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger('knime4sygma')

scenario = sygma.Scenario([
    [sygma.ruleset['phase1'], 2],
    [sygma.ruleset['phase2'], 1]
])

parents_column_name = 'RDKit molecule'
parents = input_table[parents_column_name]

metabolites = []
for parent in parents:
    logger.warning('run start\n')
    metabolic_tree = scenario.run(parent)
    logger.warning('run done\n')
    #metabolic_tree.calc_scores()
    #metabolites += metabolic_tree.to_list(parent_column=parents_column_name)

metabolites_df = pd.DataFrame(metabolites)
#output_table = pd.merge(input_table,
#                        metabolites_df,
#                        on=parents_column_name)
output_table = input_table.copy()

And adding more logging to sygma itself, running from KNIME gets stuck at tree.py:metabolizenode() with `rule'N-glucuronidation(aromatic_=n-)'onCn1c(CNc2ccc(C(=N)N)cc2)nc2cc(C(=O)N(CCC(=O)O)c3ccccn3)cc(O)c21`

sverhoeven commented 6 years ago

After further debugging its stuck at

inchi = AllChem.MolToInchi(x)

Where x is a mol with smiles Cn1c(CNc2ccc(C(=N)N)cc2)nc2cc(C(=O)N(CCC(=O)O)c3cccc[n+]3C3OC(C(=O)O)C(O)C(O)C3O)cc(O)c21.

sverhoeven commented 6 years ago

After adding more logging the break point moved to constructor of TreeNode at the AllChem.MolToInchi(mol) of mol with smile Cn1c(CNc2ccc(C(=N)NC3OC(C(=O)O)C(O)C(O)C3O)cc2)nc2cc(C(=O)NC=CC(=O)O)ccc21

sverhoeven commented 6 years ago

After upgrading rdkit 2016.03.2-np111py27_1 rdkit --> 2017.09.2.0-py27h080088d_1 rdkit, it gets stuck at:

2018-01-19 12:39:31,443 - root - DEBUG - Node Cn1c(CO)nc2cc(C(=O)Nc3ccccn3)ccc21 metabolized
2018-01-19 12:39:31,444 - root - DEBUG - Applying rule'N-glucuronidation_(aromatic_=n-)' on Cn1c(CO)nc2cc(C(=O)Nc3ccccn3)ccc21
2018-01-19 12:39:31,445 - root - DEBUG - Products made
2018-01-19 12:39:31,447 - root - DEBUG - Product Cn1c(CO)nc2cc(C(=O)Nc3cccc[n+]3C3OC(C(=O)O)C(O)C(O)C3O)ccc21
ridderl commented 6 years ago

I did some further testing and it is really something in knime that causes the problem. The following script (using the one metabolite that causes the problem) runs perfectly outside of knime, but if you put it in a knime python script node (ignoring the input table) it doesn't finish.

import sygma
from rdkit import Chem

scenario = sygma.Scenario([
    [sygma.ruleset['phase1'], 2],
    [sygma.ruleset['phase2'], 1]])

parent = Chem.MolFromSmiles("Cn1c(CNc2ccc(cc2)C(=N)N)nc3cc(ccc13)C(=O)N(CCC(=O)O)c4ccccn4")

metabolic_tree = scenario.run(parent)
metabolic_tree.calc_scores()

print metabolic_tree.to_smiles()
ridderl commented 6 years ago

I played also used the logging now, but writing out the number of total tree nodes after each application of a rule on a molecule (somewhere in tree.py). If I run the above script inside a knime node it gets stuck at 839 metabolites in the tree. With just 2 steps of phase 1 it finishes successfully with only 152 metabolites in the tree. If I run with 3 steps of phase 1 and no phase 2, it gets stuck again at 692 metabolites in the tree. So, to me it seems that the problem is related to the amount of memory available to python inside a knime node.

ridderl commented 6 years ago

One more test: with 1 x phase1 + 1 x phase2 the script finishes correctly, 148 metabolites. With 1 x phase1 and 2 x phase2 is gets stuck again at 519 metabolites.

sverhoeven commented 6 years ago

When you cancel the execution of the node the log starts printing again. The logs looks the same as when a different molecule is used without getting stuck. To me it looks like the KNIME-Python integration is blocking the execution because some buffer is not being emptied.

sverhoeven commented 6 years ago

Setup

A fresh installation of KNIME 3.5.1 with the following plugins installed:

A miniconda2 installation using

git clone git@github.com:3D-e-Chem/sygma.git
cd sygma
conda env create -f /dev/null -n sygma
source activate sygma
conda install -c rdkit rdkit
python setup.py develop
pip install protobuf
which python

In KNIME preference set the Python path to the conda sygma Python.

Result

The SyGMa Metabolotites node show 30% progress, but then freezes. I see a Python process in top a couple of seconds, but then it is gone. The KNIME console remains empty, after cancelling the node a lot of RDKit warnings are printed.

Using a Python Script (1->1) node and the following Python code:

import sygma
import pandas as pd
import logging

logging.basicConfig(level=logging.DEBUG, 
                    filename='/tmp/sygma.log', 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger('knime4sygma')

scenario = sygma.Scenario([
    [sygma.ruleset['phase1'], 2],
    [sygma.ruleset['phase2'], 1]
])

parents_column_name = 'RDKit molecule'
parents = input_table[parents_column_name]

metabolites = []
for parent in parents:
    logger.warning('run start\n')
    metabolic_tree = scenario.run(parent)
    logger.warning('run done\n')
    metabolic_tree.calc_scores()
    metabolites += metabolic_tree.to_list(parent_column=parents_column_name)

metabolites_df = pd.DataFrame(metabolites)
output_table = pd.merge(input_table,
                        metabolites_df,
                        on=parents_column_name)

The node runs successfully, printing RDKit warnings to the KNIME console the whole time. The output table has 1257 rows.

sverhoeven commented 6 years ago

Setup

An old KNIME installation version 3.3.1 with the following plugins:

Both the Sygma node and Python node show the same behavior of freezing.

sverhoeven commented 6 years ago

It seems the org.knime.python plugin is misbehaving, while the org.knime.python2 plugin works.

I will upgrade the https://github.com/3D-e-Chem/knime-python-wrapper to use the org.knime.python2 plugin and afterwards make this plugin depend on it.

sverhoeven commented 6 years ago

On https://3d-e-chem.github.io/updates there is Sygma node v1.2.0 available please test with KNIME >=3.5

ridderl commented 6 years ago

Problem indeed solved. Thanks!

Cipahi commented 5 years ago

i am running t-sne using python script in KNIME but getting stuck at 30%

Cipahi commented 5 years ago

need urgent help I think I am using pyout to get KNIME workflow

sverhoeven commented 5 years ago

If you are already using the latest KNIME and KNIME-python extension then I don't know what more to do. I would raise the problem at https://forum.knime.com/c/knime-analytics-platform,