KarchinLab / 2020plus

Classifies genes as an oncogene, tumor suppressor gene, or as a non-driver gene by using Random Forests
http://2020plus.readthedocs.org
Apache License 2.0
49 stars 17 forks source link

Error in rule simFeatures #11

Open MarkKJF opened 5 years ago

MarkKJF commented 5 years ago

Thanks for you kindly help on the last question. Restarting the code with --cores = 20 ,

$ snakemake -s Snakefile pretrained_predict -p --cores 20 \ --config mutations="/home/kjf/2020plus-1.2.2/data/bladder.txt" output_dir="output" trained_classifier="/home/kjf/2020plus-1.2.2/data/2020plus_10k.Rdata"

I get a much greater speed.

However, when it run to 25 of 48 steps (52%) done A error occurs (I pick some or the warning, removing the duplicated parts)

############################ **[Tue Apr 2 13:50:01 2019] rule simFeatures: input: output/simulated_summary/chasm_sim_summary10.txt, output/simulated_summary/oncogene_sim10.txt, output/simulated_summary/tsg_sim10.txt output: output/simulated_summary/simulated_features10.txt jobid: 16 wildcards: iter=10

python which 2020plus.py features -s output/simulated_summary/chasm_sim_summary10.txt --tsg-test output/simulated_summary/tsg_sim10.txt -og-test output/simulated_summary/oncogene_sim10.txt -o output/simulated_summary/simulated_features10.txt

Version: 1.2.2 Command: /home/kjf/2020plus-1.2.2/2020plus.py features -s output/simulated_summary/chasm_sim_summary6.txt --tsg-test output/simulated_summary/tsg_sim6.txt -og-test output/simulated_summary/oncogene_sim6.txt -o output/simulated_summary/simulated_features6.txt


AN ERROR HAS OCCURRED: check the log file

Type: <class 'ModuleNotFoundError'>

Exception: No module named 'sklearn' Traceback: File "/home/kjf/2020plus-1.2.2/2020plus.py", line 263, in import src.classify.python.classifier File "/home/kjf/2020plus-1.2.2/src/classify/python/classifier.py", line 2, in from src.classify.python.dummy_clf import DummyClf File "/home/kjf/2020plus-1.2.2/src/classify/python/dummy_clf.py", line 1, in from sklearn.dummy import DummyClassifier

[Tue Apr 2 13:50:02 2019] Error in rule simFeatures: Error in rule simFeatures: jobid: 8 jobid: 16 output: output/simulated_summary/simulated_features2.txt output: output/simulated_summary/simulated_features10.txt

RuleException: CalledProcessError in line 282 of /home/kjf/2020plus-1.2.2/Snakefile: Command 'set -euo pipefail; python which 2020plus.py features -s output/summary.txt --tsg-test output/tsg.txt -og-test output/oncogene.txt -o output/features.txt' returned non-zero exit status 1. File "/home/kjf/2020plus-1.2.2/Snakefile", line 282, in __rule_features File "/home/kjf/anaconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run

Finished working on chromosome: chr13. [Tue Apr 2 13:52:26 2019] Finished job 30. 26 of 48 steps (54%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/kjf/2020plus-1.2.2/.snakemake/log/2019-04-02T123210.444145.snakemake.log**

Could you please help with the error? Thank you.

ctokheim commented 5 years ago

You need to install scikit-learn, which was listed as one of the dependencies in the conda environment yaml file

MarkKJF commented 5 years ago

After installing scikit-learn, I run the code again. But, another mistake occurs.

AN ERROR HAS OCCURRED: check the log file


Type: <class 'ImportError'> Exception: cannot import name 'cross_validation' Traceback: File "/home/kjf/2020plus-1.2.2/2020plus.py", line 263, in import src.classify.python.classifier File "/home/kjf/2020plus-1.2.2/src/classify/python/classifier.py", line 2, in from src.classify.python.dummy_clf import DummyClf File "/home/kjf/2020plus-1.2.2/src/classify/python/dummy_clf.py", line 2, in from src.classify.python.generic_classifier import GenericClassifier File "/home/kjf/2020plus-1.2.2/src/classify/python/generic_classifier.py", line 9, in from sklearn import cross_validation

[Fri Apr 5 21:37:49 2019] Error in rule features: jobid: 2 output: output/features.txt

RuleException: CalledProcessError in line 282 of /home/kjf/2020plus-1.2.2/Snakefile: Command 'set -euo pipefail; python which 2020plus.py features -s output/summary.txt --tsg-test output/tsg.txt -og-test output/oncogene.txt -o output/features.txt' returned non-zero exit status 1. File "/home/kjf/2020plus-1.2.2/Snakefile", line 282, in __rule_features File "/home/kjf/anaconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run

Could you please help with this problem? Thanks.

ctokheim commented 5 years ago

You need to install the correct version of scikit learn, which is already specified in the conda environment file (scikit-learn<0.20.0).

MarkKJF commented 5 years ago

Thanks. I think that there are a lot of problem with my environment and required packages. So I try the command: $ pip install -r requirements.txt success

but fail with $ pip install -r requirements_dev.txt

It warns that Fail building wheel for probabilistic2020 and I check the packages by "$ conda list" most of the packages required are in wrong version. I am wondering how to turn each packages into correct version.

Zethson commented 5 years ago

@MarkKJF Create a new conda environment, activate it and install the dependencies inside the conda environment.