RajLabMSSM / echolocatoR

Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.
https://rajlabmssm.github.io/echolocatoR
MIT License
30 stars 11 forks source link

ModuleNotFoundError: No module named 'scipy' #131

Open manuelatan opened 1 year ago

manuelatan commented 1 year ago

1. Bug description

echolocatoR fails to find module scipy. I get the error "ModuleNotFoundError: No module named 'scipy'" when extracting Linkage Disequilibrium. I am running the latest version of echolocatoR, v2.0.3. Any advice would be much appreciated!

2. Reproducible example

Code


#Map column names in summary stats
columnsnames = echodata::construct_colmap(munged= FALSE,
                                          CHR = "CHR", POS = "POS",
                                          SNP = "SNP", P = "pvalue",
                                          Effect = "beta", StdErr = "SE", 
                                          A1 = "A1", A2 = "A2", Freq = "freq",
                                          N = "N")

#Fine mapping
results <- echolocatoR::finemap_loci(
  topSNPs = topSNPs,
  loci = topSNPs$Locus,
  LD_reference = "UKB", #using UK Biobank for LD reference panel
  dataset_name = "mortality_GWAS",
  fullSS_genome_build = "hg19",
  case_control = FALSE,
  finemap_methods = c("ABF","SUSIE","FINEMAP"),
  force_new_subset = TRUE,
  force_new_LD = TRUE,
  force_new_finemap = TRUE,

  # SNP filters
  bp_distance = 1000000, #distance around the lead SNP to include (1Mb, +/- 500kb)
  min_MAF = 0.001, 

  # Munge full sumstats first
  munged = FALSE,
  fullSS_path = "/Users/manuela/Documents/Work/survival_GWAS/echolocatoR/summary_stats/mortalityGWAS_summaryStats_forMunge.txt",
  colmap = columnsnames,

  #Plot options
  plot_types = c("fancy"), #in addition to GWAS and fine mapping tracks, plot XGR annotation tracks - XGR, Roadmap, Nott2019
  show_plot = TRUE,
  zoom = c("1x", "4x", "10x"))

Console output

[1] "+ Assigning Gene and Locus independently."
Standardising column headers.
First line of summary statistics file: 
SNP CHR POS P   Effect  StdErr  Freq    A1  A2  N   Locus   Gene    
Returning unmapped column names without making them uppercase.
+ Mapping colnames from MungeSumstats ==> echolocatoR
┌────────────────────────────────────────────┐
│                                            │
│   )))> 🦇 ANKRD55 [locus 1 / 10] 🦇 <(((   │
│                                            │
└────────────────────────────────────────────┘

──────────────────────────────────────────────────────────────────────────────────

── Step 1 ▶▶▶ Query 🔎 ───────────────────────────────────────────────────────────

──────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'CHR'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'CHR' .../mortalityGWAS_summaryStats_forMunge.txt; grep
    -v ^'CHR' .../mortalityGWAS_summaryStats_forMunge.txt | sort
    -k1,1n
    -k2,2n ) > .../file16c12683ada49_sorted.tsv
Constructing outputs
Using existing bgzipped file: /Users/manuela/Documents/Work/survival_GWAS/echolocatoR/summary_stats/mortalityGWAS_summaryStats_forMunge.txt.bgz 
Set force_new=TRUE to override this.
Tabix-indexing file using: Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 5:54533638-56533638
Adding 'query' column to results.
Retrieved data with 6,058 rows
Saving query ==> /var/folders/gs/pbd9rgqs6jg963j_g70phh3h0000gn/T//RtmpsoB6CR/results/GWAS/mortality_GWAS/ANKRD55/ANKRD55_mortality_GWAS_subset.tsv.gz
+ Query: 6,058 SNPs x 15 columns.
Standardizing summary statistics subset.
Standardizing main column names.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols.
++ Inferring MAF from frequency column.
++ Removing SNPs with MAF== 0 | NULL | NA or >1.
++ Preparing N_cases,N_controls cols.
++ Preparing proportion_cases col.
++ proportion_cases not included in data subset.
Preparing sample size column (N).
Using existing 'N' column.
+ Imputing t-statistic from Effect and StdErr.
+ leadSNP missing. Assigning new one by min p-value.
++ Ensuring Effect,StdErr,P are numeric.
++ Ensuring 1 SNP per row and per genomic coordinate.
++ Removing extra whitespace
+ Standardized query: 6,058 SNPs x 18 columns.
++ Saving standardized query ==> /var/folders/gs/pbd9rgqs6jg963j_g70phh3h0000gn/T//RtmpsoB6CR/results/GWAS/mortality_GWAS/ANKRD55/ANKRD55_mortality_GWAS_subset.tsv.gz

──────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ──────────────────────────────────

──────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: ukb.
Using UK Biobank LD reference panel.
+ UKB LD file name: chr5_54000001_57000001
Downloading full .gz/.npz UKB files and saving to disk.
echoconda:: conda already installed.
Retrieving conda env name from yaml: echoR_mini
echoconda:: Conda environment already exists: echoR_mini
Searching for 1 package(s) across 1 conda environment(s).
Listing all packages in environment: echoR_mini
1 unique package(s) found across 1 conda environment(s).
Downloading with axel [1 thread(s)]: https://data.broadinstitute.org/alkesgroup/UKBB_LD/chr5_54000001_57000001.gz ==> /var/folders/gs/pbd9rgqs6jg963j_g70phh3h0000gn/T//RtmpsoB6CR/results/GWAS/mortality_GWAS/ANKRD55/LD/chr5_54000001_57000001.gz
+ Overwriting pre-existing file.
axel download successful.
Time difference of 4.1 secs
echoconda:: conda already installed.
Retrieving conda env name from yaml: echoR_mini
echoconda:: Conda environment already exists: echoR_mini
Searching for 1 package(s) across 1 conda environment(s).
Listing all packages in environment: echoR_mini
1 unique package(s) found across 1 conda environment(s).
Downloading with axel [1 thread(s)]: https://data.broadinstitute.org/alkesgroup/UKBB_LD/chr5_54000001_57000001.npz ==> /var/folders/gs/pbd9rgqs6jg963j_g70phh3h0000gn/T//RtmpsoB6CR/results/GWAS/mortality_GWAS/ANKRD55/LD/chr5_54000001_57000001.npz
+ Overwriting pre-existing file.
axel download successful.
Time difference of 15.7 secs
ModuleNotFoundError: No module named 'scipy'
Locus ANKRD55 complete in: 1.28 min

3. Session info

``` R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.6 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] MungeSumstats_1.7.10 [2] forcats_0.5.2 [3] stringr_1.5.0 [4] dplyr_1.0.10 [5] purrr_0.3.5 [6] readr_2.1.3 [7] tidyr_1.2.1 [8] tibble_3.1.8 [9] ggplot2_3.4.0 [10] tidyverse_1.3.2 [11] data.table_1.14.6 [12] BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1 [13] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22 [14] BSgenome_1.66.1 [15] rtracklayer_1.58.0 [16] Biostrings_2.66.0 [17] XVector_0.38.0 [18] GenomicRanges_1.50.1 [19] GenomeInfoDb_1.34.4 [20] IRanges_2.32.0 [21] S4Vectors_0.36.1 [22] BiocGenerics_0.44.0 [23] echolocatoR_2.0.3 loaded via a namespace (and not attached): [1] rappdirs_0.3.3 GGally_2.1.2 [3] R.methodsS3_1.8.2 echoLD_0.99.8 [5] bit64_4.0.5 knitr_1.41 [7] irlba_2.3.5.1 DelayedArray_0.24.0 [9] R.utils_2.12.2 rpart_4.1.16 [11] KEGGREST_1.38.0 RCurl_1.98-1.9 [13] AnnotationFilter_1.22.0 generics_0.1.3 [15] GenomicFeatures_1.50.2 RSQLite_2.2.19 [17] proxy_0.4-27 bit_4.0.5 [19] tzdb_0.3.0 xml2_1.3.3 [21] lubridate_1.9.0 SummarizedExperiment_1.28.0 [23] assertthat_0.2.1 viridis_0.6.2 [25] gargle_1.2.1 xfun_0.35 [27] hms_1.1.2 fansi_1.0.3 [29] restfulr_0.0.15 progress_1.2.2 [31] dbplyr_2.2.1 readxl_1.4.1 [33] Rgraphviz_2.41.1 igraph_1.3.5 [35] DBI_1.1.3 htmlwidgets_1.5.4 [37] reshape_0.8.9 downloadR_0.99.5 [39] googledrive_2.0.0 ellipsis_0.3.2 [41] ggnewscale_0.4.8 backports_1.4.1 [43] biomaRt_2.54.0 deldir_1.0-6 [45] MatrixGenerics_1.10.0 vctrs_0.5.1 [47] Biobase_2.58.0 here_1.0.1 [49] ensembldb_2.22.0 withr_2.5.0 [51] cachem_1.0.6 checkmate_2.1.0 [53] GenomicAlignments_1.34.0 prettyunits_1.1.1 [55] cluster_2.1.4 ape_5.6-2 [57] dir.expiry_1.6.0 lazyeval_0.2.2 [59] crayon_1.5.2 basilisk.utils_1.10.0 [61] crul_1.3 pkgconfig_2.0.3 [63] nlme_3.1-160 ProtGenerics_1.30.0 [65] XGR_1.1.8 nnet_7.3-18 [67] pals_1.7 rlang_1.0.6 [69] lifecycle_1.0.3 filelock_1.0.2 [71] httpcode_0.3.0 BiocFileCache_2.6.0 [73] modelr_0.1.9 echotabix_0.99.8 [75] dichromat_2.0-0.1 rprojroot_2.0.3 [77] cellranger_1.1.0 coloc_5.1.0.1 [79] matrixStats_0.63.0 graph_1.76.0 [81] Matrix_1.5-1 osfr_0.2.9 [83] boot_1.3-28 reprex_2.0.2 [85] base64enc_0.1-3 googlesheets4_1.0.1 [87] png_0.1-8 viridisLite_0.4.1 [89] rjson_0.2.21 rootSolve_1.8.2.3 [91] bitops_1.0-7 R.oo_1.25.0 [93] ggnetwork_0.5.10 blob_1.2.3 [95] mixsqp_0.3-48 echoplot_0.99.6 [97] dnet_1.1.7 jpeg_0.1-10 [99] echodata_0.99.16 scales_1.2.1 [101] memoise_2.0.1 magrittr_2.0.3 [103] plyr_1.8.8 hexbin_1.28.2 [105] zlibbioc_1.44.0 compiler_4.2.1 [107] echoconda_0.99.8 BiocIO_1.8.0 [109] RColorBrewer_1.1-3 catalogueR_1.0.0 [111] Rsamtools_2.14.0 cli_3.4.1 [113] echoannot_0.99.10 patchwork_1.1.2 [115] htmlTable_2.4.1 Formula_1.2-4 [117] MASS_7.3-58.1 tidyselect_1.2.0 [119] stringi_1.7.8 yaml_2.3.6 [121] supraHex_1.35.0 latticeExtra_0.6-30 [123] ggrepel_0.9.2 grid_4.2.1 [125] VariantAnnotation_1.44.0 tools_4.2.1 [127] lmom_2.9 timechange_0.1.1 [129] parallel_4.2.1 rstudioapi_0.14 [131] foreign_0.8-83 piggyback_0.1.4 [133] gridExtra_2.3 gld_2.6.6 [135] digest_0.6.31 snpStats_1.48.0 [137] BiocManager_1.30.19 Rcpp_1.0.9 [139] broom_1.0.1 OrganismDbi_1.40.0 [141] httr_1.4.4 AnnotationDbi_1.60.0 [143] RCircos_1.2.2 ggbio_1.46.0 [145] biovizBase_1.46.0 colorspace_2.0-3 [147] rvest_1.0.3 XML_3.99-0.13 [149] fs_1.5.2 reticulate_1.26 [151] splines_4.2.1 RBGL_1.74.0 [153] expm_0.999-6 echofinemap_0.99.4 [155] basilisk_1.10.2 Exact_3.2 [157] mapproj_1.2.9 jsonlite_1.8.4 [159] susieR_0.12.27 R6_2.5.1 [161] Hmisc_4.7-2 pillar_1.8.1 [163] htmltools_0.5.4 glue_1.6.2 [165] fastmap_1.1.0 DT_0.26 [167] BiocParallel_1.32.4 class_7.3-20 [169] codetools_0.2-18 maps_3.4.1 [171] mvtnorm_1.1-3 utf8_1.2.2 [173] lattice_0.20-45 curl_4.3.3 [175] DescTools_0.99.47 zip_2.2.2 [177] openxlsx_4.2.5.1 interp_1.1-3 [179] survival_3.4-0 googleAuthR_2.0.0 [181] munsell_0.5.0 e1071_1.7-12 [183] GenomeInfoDbData_1.2.9 haven_2.5.1 [185] reshape2_1.4.4 gtable_0.3.1 ```