RajLabMSSM / echolocatoR

Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.
https://rajlabmssm.github.io/echolocatoR
MIT License
34 stars 11 forks source link

echoverse - Failing to find an unknown package #87

Closed AMCalejandro closed 1 year ago

AMCalejandro commented 2 years ago

1. Bug description

finemap_loci pipeline fails to find echoverse package?

Console output

Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
⠊⠉⠡⣀⣀⠊⠉⠡⣀⣀⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠                                    
⠌⢁⡐⠉⣀⠊⢂⡐⠑⣀⠊⢂⡐⠑⣀⠊⢂⡐⠑⣀⠊⢂⡐⠑⣀⠊⢂⡐⠑⣀⠉⢂⡈⠑⣀⠉⢄⡈⠡⣀                                    
⠌⡈⡐⢂⢁⠒⡈⡐⢂⢁⠒⡈⡐⢂⢁⠑⡈⡈⢄⢁⠡⠌⡈⠤⢁⠡⠌⡈⠤⢁⠡⠌⡈⡠⢁⢁⠊⡈⡐⢂                                    

── 🦇  🦇  🦇 e c h o l o c a t o R 🦇  🦇  🦇 ─────────────────────────────────

── v2.0.0 ──────────────────────────────────────────────────────────────────────
⠌⡈⡐⢂⢁⠒⡈⡐⢂⢁⠒⡈⡐⢂⢁⠑⡈⡈⢄⢁⠡⠌⡈⠤⢁⠡⠌⡈⠤⢁⠡⠌⡈⡠⢁⢁⠊⡈⡐⢂                                    
⠌⢁⡐⠉⣀⠊⢂⡐⠑⣀⠊⢂⡐⠑⣀⠊⢂⡐⠑⣀⠊⢂⡐⠑⣀⠊⢂⡐⠑⣀⠉⢂⡈⠑⣀⠉⢄⡈⠡⣀                                    
⠊⠉⠡⣀⣀⠊⠉⠡⣀⣀⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠⠊⠉⠢⣀⡠                                    
ⓞ If you use echolocatoR, please cite:                                          
     ▶ Brian M Schilder, Jack Humphrey, Towfique                                
     Raj (2021) echolocatoR: an automated                                       
     end-to-end statistical and functional                                      
     genomic fine-mapping pipeline,                                             
     Bioinformatics; btab658,                                                   
     https://doi.org/10.1093/bioinformatics/btab658                             
ⓞ Please report any bugs/feature requests on GitHub:
     ▶
     https://github.com/RajLabMSSM/echolocatoR/issues
ⓞ Contributions are welcome!:
     ▶
     https://github.com/RajLabMSSM/echolocatoR/pulls

────────────────────────────────────────────────────────────────────────────────
echoconda:: Conda already installed.
echoconda:: Active conda env: 'echoverse'
echoconda:: Requested conda_env is already active: 'echoverse'
echoconda:: Attempting to activate conda env: 'echoverse'

)   )  ) ))))))}}}}}}}} LINC00511  ( 1  /  2 ) {{{{{{{{{(((((( (  (   (
+ Extracting relevant variants from fullSS...
+ Query Method: tabix
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferring comment_char from header: '#MarkerName'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'#MarkerName' .../for_echolocatoR_axialOutcome_3.tsv; grep
    -v ^'#MarkerName' .../for_echolocatoR_axialOutcome_3.tsv | sort
    -k2,2n
    -k3,3n ) > .../file2dbbdf43076c0a_sorted.tsv
Constructing outputs
Using existing bgzipped file: /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/for_echolocatoR_axialOutcome_3.tsv.bgz 
Set force_new=TRUE to override this.
Tabix-indexing file using Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format:
  - data: /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/for_echolocatoR_axialOutcome_3.tsv.bgz 
  - index: /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/for_echolocatoR_axialOutcome_3.tsv.bgz.tbi
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 17:70330179-70330179
Adding 'query' column to results.
Retrieved data with 1 rows
Saving query ==> /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_25.3.2022/mixedmodels_GWAS/earlymotorPD_axial/LINC00511/LINC00511_earlymotorPD_axial_subset.tsv.gz
LD:: Standardizing summary statistics subset.
++ Preparing Gene col
Could not recognize genome build of:
 - target_genome
These will be inferred from the data.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols
++ Inferring MAF from frequency column...
++ Preparing N_cases,N_controls cols
++ Preparing `proportion_cases` col
++ 'proportion_cases' not included in data subset.
++ Preparing N col
--
WARNING: Neff column could not be calculated as the columns N_CAS & N_CON were not found in the datset
--
+ Mapping colnames from MungeSumstats ==> echolocatoR
++ Preparing t-stat col
+ Calculating t-statistic from Effect and StdErr...
++ Assigning lead SNP
++ Ensuring Effect, StdErr, P are numeric
++ Ensuring 1 SNP per row
++ Removing extra whitespace
++ Saving subset ==> /mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_25.3.2022/mixedmodels_GWAS/earlymotorPD_axial/LINC00511/LINC00511_earlymotorPD_axial_subset.tsv.gz
+ Extraction completed in 42.87 seconds
+ 1 SNPs x  12 columns
+ Mapping colnames from MungeSumstats ==> echolocatoR
Standardising column headers.
First line of summary statistics file: 
CHR POS SNP P   Effect  StdErr  A1  A2  Freq    MAF t_stat  leadSNP 
Using UK Biobank LD reference panel.
+ UKB LD file name: chr17_70000001_73000001
Downloading full .gz/.npz UKB files and saving to disk.
Downloading with axel (using 95 cores).
+ Overwriting pre-existing file.
Searching for 1 package(s) across 1 conda environment(s):
 - echoverse
Identified paths for 1 / 1 packages.
1 unique package(s) found across 1 conda environment(s).
sh: 1: echoverse: not found
axel download failed. Trying with download.file.
Downloading with download.file.
Time difference of 9.8 secs
Downloading with axel (using 95 cores).
+ Overwriting pre-existing file.
Searching for 1 package(s) across 1 conda environment(s):
 - echoverse
Identified paths for 1 / 1 packages.
1 unique package(s) found across 1 conda environment(s).
sh: 1: echoverse: not found
axel download failed. Trying with download.file.
Downloading with download.file.
Time difference of 28.3 secs
Error in py_run_file_impl(file, local, convert) : 
  Unable to open file '' (does it exist?)
In addition: Warning messages:
1: 'genome' not found: hg37 
2: In system(cmd) : error in running command
3: In system(cmd) : error in running command
Fine-mapping complete in:
Time difference of 1.5 mins

Code

library(data.table)
library(echolocatoR)  
#library(tidyverse)

fullSS_path <- "/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/for_echolocatoR_axialOutcome_3.tsv"                                                                    
fullRS_path <- "/mnt/rreal/RDS/RDS/acarrasco/ANALYSES_WORKSPACE/EARLY_PD/POST_GWAS/ECHOLOCATOR/RESULTS_25.3.2022"

top_SNPs = fread("../topSNPs_axialMotorSymptom.txt") 
top_SNPs = top_SNPs[c(5,6), ]

res = finemap_loci(top_SNPs = top_SNPs,
                                              loci = top_SNPs$Locus, 
                                              dataset_name = "earlymotorPD_axial", 
                                              dataset_type = "mixedmodels_GWAS",   
                                              force_new_subset = T, 
                                              force_new_LD = T, 
                                              force_new_finemap = T, 
                                              remove_tmps = F, 

                     # SUMMARY STATS ARGUMENTS 
                     fullSS_genome_build = "hg19",
                     fullSS_path = fullSS_path,
                     results_dir = fullRS_path,
                     query_by = "tabix", 
                     chrom_col = "CHR", position_col = "POS", snp_col = "#MarkerName", 
                     pval_col = "Pval", effect_col = "Effect", stderr_col = "StdErr", 
                     freq_col = "medianFreq", MAF_col = "calculate", 
                     A1_col = "Allele1", 
                     A2_col = "Allele2", 
                     #N_cases_col = "TotalSampleSize",
                     #N_controls = 0,

                     # FILTERING ARGUMENTS 
                     ## It's often desirable to use a larger window size  
                     ## (e.g. 2Mb which is bp_distance=500000*2),  
                     ## but we use a small window here to speed up the process.  
                     bp_distance = 500000*2, 
                     min_MAF = 0.001,   
                     trim_gene_limits = F, 

                     # FINE-MAPPING ARGUMENTS 
                     ## General 
                     finemap_methods = c("ABF", "SUSIE", "POLYFUN_SUSIE", "FINEMAP"),  
                     n_causal = 5, 
                     PP_threshold = .95,  
                     consensus_threshold = 2,
                     # LD ARGUMENTS  
                     LD_genome_build = "hg19",
                     LD_reference = "UKB", 
                     superpopulation = "EUR", 
                     download_method = "axel", 

                     # Additional arguments - My arguments
                     case_control = F,
                     nThread = 15,
                     sample_size = 3572,

                     # PLOT ARGUMENTS  
                     ## general    
                     plot_types = c("fancy"), 
                     ## Generate multiple plots of different window sizes;  
                     ### all SNPs, 4x zoomed-in, and a 50000bp window 
                     zoom = c("all","4x","10x", "30x"), 
                     ## XGR 
                     # plot.XGR_libnames=c("ENCODE_TFBS_ClusteredV3_CellTypes"),  
                     ## Roadmap 
                     roadmap=FALSE,
                     roadmap_query=NULL,

                     #plot.Roadmap = F, 
                     #plot.Roadmap_query = NULL, 
                     # Nott et al. (2019) 
                     nott_epigenome=TRUE,
                     nott_show_placseq=TRUE,
                     #plot.Nott_epigenome = T,  
                     #plot.Nott_show_placseq = T,  

                     verbose = TRUE,

                     # ENVIRONMENT ARGS
                     conda_env= "echoverse"
                    )

2. Session info

``` R version 4.1.2 (2021-11-01) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS Matrix products: default BLAS/LAPACK: /home/acarrasco/.conda/envs/echoverse/lib/libopenblasp-r0.3.18.so locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] echolocatoR_2.0.0 data.table_1.14.2 loaded via a namespace (and not attached): [1] utf8_1.2.2 reticulate_1.24 [3] R.utils_2.11.0 tidyselect_1.1.2 [5] RSQLite_2.2.11 AnnotationDbi_1.56.2 [7] htmlwidgets_1.5.4 grid_4.1.2 [9] BiocParallel_1.28.3 XGR_1.1.7 [11] munsell_0.5.0 DT_0.22 [13] colorspace_2.0-3 Biobase_2.54.0 [15] filelock_1.0.2 OrganismDbi_1.36.0 [17] knitr_1.38 supraHex_1.32.0 [19] rstudioapi_0.13 stats4_4.1.2 [21] DescTools_0.99.44 MatrixGenerics_1.6.0 [23] GenomeInfoDbData_1.2.7 mixsqp_0.3-43 [25] bit64_4.0.5 echoconda_0.99.5 [27] rprojroot_2.0.2 vctrs_0.3.8 [29] generics_0.1.2 xfun_0.30 [31] biovizBase_1.42.0 BiocFileCache_2.2.1 [33] R6_2.5.1 GenomeInfoDb_1.30.1 [35] AnnotationFilter_1.18.0 bitops_1.0-7 [37] cachem_1.0.6 reshape_0.8.8 [39] DelayedArray_0.20.0 assertthat_0.2.1 [41] BiocIO_1.4.0 scales_1.1.1 [43] nnet_7.3-17 rootSolve_1.8.2.3 [45] gtable_0.3.0 lmom_2.8 [47] ggbio_1.42.0 ensembldb_2.18.4 [49] rlang_1.0.2 clisymbols_1.2.0 [51] MungeSumstats_1.3.16 echodata_0.99.7 [53] splines_4.1.2 rtracklayer_1.54.0 [55] lazyeval_0.2.2 gargle_1.2.0 [57] dichromat_2.0-0 hexbin_1.28.2 [59] checkmate_2.0.0 BiocManager_1.30.16 [61] yaml_2.3.5 reshape2_1.4.4 [63] snpStats_1.44.0 GenomicFeatures_1.46.5 [65] ggnetwork_0.5.10 backports_1.4.1 [67] Hmisc_4.6-0 RBGL_1.70.0 [69] tools_4.1.2 echoplot_0.99.2 [71] ggplot2_3.3.5 ellipsis_0.3.2 [73] RColorBrewer_1.1-2 proxy_0.4-26 [75] BiocGenerics_0.40.0 coloc_5.1.2 [77] Rcpp_1.0.8.3 plyr_1.8.7 [79] base64enc_0.1-3 progress_1.2.2 [81] zlibbioc_1.40.0 purrr_0.3.4 [83] RCurl_1.98-1.6 prettyunits_1.1.1 [85] rpart_4.1.16 viridis_0.6.2 [87] S4Vectors_0.32.4 SummarizedExperiment_1.24.0 [89] ggrepel_0.9.1 cluster_2.1.2 [91] here_1.0.1 fs_1.5.2 [93] magrittr_2.0.2 echotabix_0.99.5 [95] dnet_1.1.7 openxlsx_4.2.5 [97] gh_1.3.0 mvtnorm_1.1-3 [99] ProtGenerics_1.26.0 matrixStats_0.61.0 [101] patchwork_1.1.1 hms_1.1.1 [103] XML_3.99-0.9 jpeg_0.1-9 [105] IRanges_2.28.0 gridExtra_2.3 [107] compiler_4.1.2 biomaRt_2.50.3 [109] tibble_3.1.6 crayon_1.5.1 [111] R.oo_1.24.0 htmltools_0.5.2 [113] echoannot_0.99.4 tzdb_0.3.0 [115] Formula_1.2-4 tidyr_1.2.0 [117] expm_0.999-6 Exact_3.1 [119] lubridate_1.8.0 DBI_1.1.2 [121] dbplyr_2.1.1 MASS_7.3-56 [123] rappdirs_0.3.3 boot_1.3-28 [125] Matrix_1.4-1 readr_2.1.2 [127] piggyback_0.1.1 cli_3.2.0 [129] R.methodsS3_1.8.1 echofinemap_0.99.0 [131] parallel_4.1.2 igraph_1.2.11 [133] GenomicRanges_1.46.1 pkgconfig_2.0.3 [135] GenomicAlignments_1.30.0 RCircos_1.2.2 [137] foreign_0.8-82 xml2_1.3.3 [139] XVector_0.34.0 echoLD_0.99.1 [141] stringr_1.4.0 VariantAnnotation_1.40.0 [143] digest_0.6.29 graph_1.72.0 [145] Biostrings_2.62.0 htmlTable_2.4.0 [147] gld_2.6.4 restfulr_0.0.13 [149] curl_4.3.2 Rsamtools_2.10.0 [151] rjson_0.2.21 lifecycle_1.0.1 [153] nlme_3.1-157 jsonlite_1.8.0 [155] viridisLite_0.4.0 BSgenome_1.62.0 [157] fansi_1.0.3 downloadR_0.99.1 [159] susieR_0.11.92 pillar_1.7.0 [161] lattice_0.20-45 GGally_2.1.2 [163] KEGGREST_1.34.0 fastmap_1.1.0 [165] httr_1.4.2 survival_3.3-1 [167] googleAuthR_2.0.0 glue_1.6.2 [169] zip_2.2.0 png_0.1-7 [171] bit_4.0.4 Rgraphviz_2.38.0 [173] class_7.3-20 stringi_1.7.6 [175] blob_1.2.2 latticeExtra_0.6-29 [177] memoise_2.0.1 dplyr_1.0.8 [179] irlba_2.3.5 e1071_1.7-9 [181] ape_5.6-2 ```
bschilder commented 2 years ago

Could you clarify how you created the conda environment "echoverse"?

Either way, it's odd that echoconda would be looking for a software package of the same name. Looking into this now.

bschilder commented 2 years ago

Ok, based on the date posted i think this was a bug in an older version of echoverse. Could you try reinstalling on that branch (now master branch) and try again?

Apologies for the long delay!

bschilder commented 2 years ago

@AMCalejandro has this since resolved for you with the updates?

AMCalejandro commented 2 years ago

Just to be clear, that conda environment was create from yml file.

Please, find it attached here


name: echoverse
channels:
  - conda-forge
  - bioconda
  - nodefaults
dependencies:
  # Python
  - python>=3.6.1
  - pandas>=0.25.0
  - pandas-plink
  - fastparquet
  - pyarrow
  - scipy
  - scikit-learn
  - tqdm
  - bitarray
  - networkx
  - rpy2
  - requests
  # Command line
  - htslib
  - plink
  - bcftools
  - wget
  - axel
  # R
  - r>=4.1.0
  - r-biocmanager
  - bioconductor-snpstats
  - bioconductor-ggbio
  - bioconductor-ensdb.hsapiens.v75
  - bioconductor-biomart
  - radian
  - pip

I am happy to reinstall echolocatoR, and give a try to run the workflow again using the default echoR_mini. I will do so when you push the fix the sample size input to master

bschilder commented 2 years ago

Please, find it attached here

Perfect, thanks

I will do so when you push the fix the sample size input to master

Cool, this has already been pushed.

AMCalejandro commented 2 years ago

Cannot see the commit to master tho

[cid:7e288c15-3222-4434-8ac7-46e9c0aeade3]


From: Brian M. Schilder @.> Sent: 20 September 2022 15:11 To: RajLabMSSM/echolocatoR @.> Cc: Martinez Carrasco, Alejandro @.>; Mention @.> Subject: Re: [RajLabMSSM/echolocatoR] echoverse - Failing to find an unknown package (Issue #87)

⚠ Caution: External sender

Please, find it attached here

Perfect, thanks

I will do so when you push the fix the sample size input to master

Cool, this has already been pushed.

— Reply to this email directly, view it on GitHubhttps://github.com/RajLabMSSM/echolocatoR/issues/87#issuecomment-1252414304, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUDOQDZK2KLW6KRD5LNEVO3V7HAZ7ANCNFSM5SBFNFVQ. You are receiving this because you were mentioned.Message ID: @.***>

bschilder commented 2 years ago

The necessary edits were all done in echodata. So you just need to update that subpackage.