CancerRxGene / gdsctools

Tools related to the Genomics of Drug Sensitivity in Cancer (GDSC) projects (http://www.cancerrxgene.org/ )
Other
34 stars 14 forks source link

difficulty getting started #165

Open dgcovell opened 7 years ago

dgcovell commented 7 years ago

Installed gdsctools.0.17.0 on a virtual linux machine with python.2.7.0 compatible git and pip. I am trying to run commands as stated in the GDSCTools 0.17.0 documentation (http://gdsctools/readthedocs.io/en/master/anova_partone.html). I am testing from a shell or within python, but having little success. Your help would be most appreciated. Here is what I can provide;

  1. Installation tests o.k.

gdsctools_anova --test Welcome to GDSCTools standalone

Testing mode: TISSUE FACTOR : included MEDIA FACTOR : NOT included MSI FACTOR : included FEATURE FACTOR : included 1 ANOVA_FEATURE_pval 1.57507e-58 ANOVA_MEDIA_pval NaN ANOVA_MSI_pval 0.0259029 ANOVA_TISSUE_pval 1.02587e-44 DRUG_ID 1047 DRUG_NAME NaN DRUG_TARGET NaN FEATURE TP53_mut FEATURE_IC50_T_pval 1.27218e-68 FEATURE_IC50_effect_size 1.39063 FEATURE_delta_MEAN_IC50 1.57421 FEATURE_neg_Glass_delta 1.09839 FEATURE_neg_IC50_sd 1.4332 FEATURE_neg_logIC50_MEAN 2.49511 FEATURE_pos_Glass_delta 1.68301 FEATURE_pos_IC50_sd 0.935351 FEATURE_pos_logIC50_MEAN 4.06932 N_FEATURE_neg 292 N_FEATURE_pos 554

GDSCTools seems to be installed properly

  1. gdsctools_anova --input-ic50 IC50_v17.csv.gz --input-features genomic_features_v17.csv.gz

runs to completion, but html_gdsc_anova directory is wothout files; with only new entries in ./images EBI_logo.png sanger-logo.png

  1. gdsctools_regression -I IC50_v17.csv.gz -F genomic_features_v17.csv.gz --method lasso

Welcome to GDSCTools standalone (lasso, ridge, elastic net)

File config.yaml and regression.rules created in ./analysis First go to the directory where analysis will be performed

cd analysis

You have two choices now. Either you are on a laptop, or you are on a cluster.

  1. LOCAL COMPUTER:

    snakemake -s regression.rules -p

where -p means 'print statements'

  1. CLUSTERS:

On a SLURM cluster, you can make use of the many cores available by typing for instance:

srun --qos normal snakemake -s regression.rules -j 40 --cluster "sbatch --qos normal"

For more information about snakemake commands, type

snakemake --help

I tried to find snamemake w/o success.

  1. from python did the following

    from gdsctools import * dir() ['ANOVA', 'ANOVAReport', 'ANOVAResults', 'ANOVASettings', 'COSMICInfo', 'Data', 'DrugDecode', 'GDSC', 'GDSCElasticNet', 'GDSCLasso', 'GDSCRidge', 'GenomicFeatures', 'IC50', 'IC50Cluster', 'OmniBEMBuilder', 'Reader', 'TCGA', 'VolcanoANOVA', 'builtins', 'doc', 'name', 'package', 'anova', 'anova_report', 'anova_results', 'boxplots', 'boxswarm', 'cancer_cell_lines', 'cosmic_builder_test', 'cosmic_info', 'cosmictools', 'datasets', 'errors', 'gdsc', 'gdsctools_data', 'gdsctools_help', 'genomic_features', 'gf_v17', 'gf_v5', 'ic50_test', 'ic50_v17', 'ic50_v5', 'license', 'models', 'omnibem', 'os', 'pkg_resources', 'qvalue', 'readers', 'regression', 'report', 'settings', 'signed_effects', 'stats', 'tissues', 'tools', 'version', 'volcano', 'warnings']

ic = IC50(ic50_v17) print(ic) Number of drugs: 265 Number of cell lines: 988 Percentage of NA 0.183458100985

gf = GenomicFeatures(gf_v17) print(gf) Genomic features distribution Number of unique tissues 27 Here are the first 10 tissues: myeloma, nervous_system, soft_tissue, bone, lung_NSCLC, skin, Bladder, cervix, lung_SCLC, lung MSI column: yes MEDIA column: no

There are 677 unique features distributed as

gdsc = ANOVA('IC50_v17.csv.gz','genomic_features_v17.csv.gz') TISSUE FACTOR : included MEDIA FACTOR : NOT included MSI FACTOR : included FEATURE FACTOR : included results = gdsc.anova.all() Traceback (most recent call last): File "", line 1, in AttributeError: 'ANOVA' object has no attribute 'anova' gdsc Number of drugs: 265 Number of cell lines: 988 Percentage of NA 0.183458100985

Genomic features distribution Number of unique tissues 27 Here are the first 10 tissues: lung_NSCLC, prostate, stomach, nervous_system, skin, Bladder, leukemia, kidney, thyroid, soft_tissue MSI column: yes MEDIA column: no

There are 677 unique features distributed as

gdsc.anova_one_drug_one_feature(1047, 'TP53', show=TRUE) Traceback (most recent call last): File "", line 1, in NameError: name 'TRUE' is not defined

cokelaer commented 7 years ago

There are lots of different questions and it is not clear where is the error to me.

A few comments:

  1. there are currently two types of analysis: (i) ANOVA association, the standalone is called gdsctools_anova and (ii) sparse linear regression using machine learning; this means Lasso, Ridge, ElasticNet; the standalone is gdsctools_regression. In the ANOVA case, the code would work under PYthon 2 and 2, however, the standalone gdsctools_regression makes use of Snakemake that works only under PYthon3.5 and above.
  2. For the installation, I would recommend to use conda
  3. One of the error you provide is due to a typo in your code:
    results = gdsc.anova.all()

    should be gdsc.anova_all()

maybe it is a typo in the documentation. If so, please let me know

Similary for the last error you mention: TRUE is not correct Python. Instead, you should use True

Hope this help you. In the future, please put open one issue per error. Best