RajLabMSSM / echolocatoR

Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.
https://rajlabmssm.github.io/echolocatoR
MIT License
34 stars 11 forks source link

Problems with echolocatoR::finemap_loci #125

Closed rootze closed 1 year ago

rootze commented 1 year ago

Hello @bschilder Thank you for instructing me on the installation of the echolocatoR in my last GitHub issue. Great work again. Love the image after loading the echolocatoR library. πŸ¦‡ πŸ¦‡ πŸ¦‡ e c h o l o c a t o R πŸ¦‡ πŸ¦‡ πŸ¦‡

I was trying to find some tutorials to follow for this newest version of echolocatoR, so I found this r-markdown. https://github.com/RajLabMSSM/echolocatoR/blob/master/vignettes/echolocatoR.Rmd

However, I have some problems with getting results.

> results <- echolocatoR::finemap_loci(
+  fullSS_path = fullSS_path,
+  topSNPs = topSNPs,
+  loci = c("BST1","MEX3C"),
+  LD_reference = "1KGphase3",
+  dataset_name = "Nalls23andMe_2019",
+  fullSS_genome_build = "hg19",
+  bp_distance = 1000,
+  finemap_methods = c("ABF","SUSIE","FINEMAP"),
+  munged = TRUE)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                        β”‚
β”‚   )))> πŸ¦‡ BST1 [locus 1 / 2] πŸ¦‡ <(((   β”‚
β”‚                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 1 β–Άβ–Άβ–Ά Query πŸ”Ž ──────────────────────────────────────────────────────────────────────────────────────────────────────────

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Assuming fullSS_path summary stats have already been processed with MungeSumstats.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'SNP' .../nalls2019.fullSS_subset.tsv; grep
    -v ^'SNP' .../nalls2019.fullSS_subset.tsv | sort
    -k2,2n
    -k3,3n ) > .../file3217102c2dc7ac_sorted.tsv
Constructing outputs
Selecting valid alternative method: conda
bgzipping file with conda.
echoconda:: conda already installed.
Retrieving conda env name from yaml: echoR_mini
Warning: Could not identify any conda_env matching existing conda environments.
Identified yaml file stored in echoconda.
Retrieving conda env name from yaml: /data/homezvol0/zechuas/.conda/envs/echolocator/lib/R/library/echoconda/conda/echoR_mini.yml
Warning: Could not identify any conda_env matching existing conda environments.
echoconda:: Creating conda environment: echoR_mini
sh: /data/homezvol0/zechuas/.cache/R/basilisk/1.8.1/0/etc/profile.d/conda.sh: No such file or directory
Specified conda binary '/data/homezvol0/zechuas/.cache/R/basilisk/1.8.1/0/bin/conda' does not exist.Locus BST1 complete in: 1.06 min
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                         β”‚
β”‚   )))> πŸ¦‡ MEX3C [locus 2 / 2] πŸ¦‡ <(((   β”‚
β”‚                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 1 β–Άβ–Άβ–Ά Query πŸ”Ž ──────────────────────────────────────────────────────────────────────────────────────────────────────────

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'SNP' .../nalls2019.fullSS_subset.tsv; grep
    -v ^'SNP' .../nalls2019.fullSS_subset.tsv | sort
    -k2,2n
    -k3,3n ) > .../file32171020b60a3a_sorted.tsv
Constructing outputs
Selecting valid alternative method: conda
bgzipping file with conda.
echoconda:: conda already installed.
Retrieving conda env name from yaml: echoR_mini
Warning: Could not identify any conda_env matching existing conda environments.
Identified yaml file stored in echoconda.
Retrieving conda env name from yaml: /data/homezvol0/zechuas/.conda/envs/echolocator/lib/R/library/echoconda/conda/echoR_mini.yml
Warning: Could not identify any conda_env matching existing conda environments.
echoconda:: Creating conda environment: echoR_mini
sh: /data/homezvol0/zechuas/.cache/R/basilisk/1.8.1/0/etc/profile.d/conda.sh: No such file or directory
Specified conda binary '/data/homezvol0/zechuas/.cache/R/basilisk/1.8.1/0/bin/conda' does not exist.Locus MEX3C complete in: 1.06 min

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 6 β–Άβ–Άβ–Ά Postprocess data 🎁 ───────────────────────────────────────────────────────────────────────────────────────────────

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Returning results as nested list.
All loci done in: 2.12 min
Warning messages:
1: In rhtslib_warning(method = method, rhtslib_pkgs = rhtslib_pkgs,  :
  The selected method (rsamtools) depends on Rhtslib, and the version you have installed (1.28.0) contains known bugs. Please install Rhtslib >=1.99.2 via Bioconductor >=3.16:
    BiocManager::install(version='devel')
2: In system(paste(act.cmd, collapse = " "), intern = TRUE) :
  running command '. '/data/homezvol0/zechuas/.cache/R/basilisk/1.8.1/0/etc/profile.d/conda.sh' && conda activate && /data/homezvol0/zechuas/.conda/envs/echolocator/lib/R/bin/Rscript --no-save --no-restore --no-site-file --no-init-file --default-packages=NULL -e "con <- socketConnection(port=11923, open='wb', blocking=TRUE);serialize(Sys.getenv(), con);close(con)"' had status 1
3: In socketAccept(soc, blocking = TRUE, open = "a+b") :
  problem in accepting connections on this socket
4: In .activate_condaenv(output, envpath, loc) :
  failed to activate the environment at ''
5: In rhtslib_warning(method = method, rhtslib_pkgs = rhtslib_pkgs,  :
  The selected method (rsamtools) depends on Rhtslib, and the version you have installed (1.28.0) contains known bugs. Please install Rhtslib >=1.99.2 via Bioconductor >=3.16:
    BiocManager::install(version='devel')
6: In system(paste(act.cmd, collapse = " "), intern = TRUE) :
  running command '. '/data/homezvol0/zechuas/.cache/R/basilisk/1.8.1/0/etc/profile.d/conda.sh' && conda activate && /data/homezvol0/zechuas/.conda/envs/echolocator/lib/R/bin/Rscript --no-save --no-restore --no-site-file --no-init-file --default-packages=NULL -e "con <- socketConnection(port=11923, open='wb', blocking=TRUE);serialize(Sys.getenv(), con);close(con)"' had status 1
7: In socketAccept(soc, blocking = TRUE, open = "a+b") :
  problem in accepting connections on this socket
8: In .activate_condaenv(output, envpath, loc) :
  failed to activate the environment at ''
>

I am not so sure if there is anything wrong with the above message when running echolocatoR::finemap_loci But here are the things in my R env, and the returned result following the code you provided in that link is a list of BST1, MEX3C, and merged_dat, but all are empty. Is that what we expected, or is something wrong here?

> ls()
[1] "colmap"      "fullSS_path" "results"     "topSNPs"     "topSS"
> results
$BST1
NULL

$MEX3C
NULL

$merged_dat
Null data.table (0 rows and 0 cols)

Thank you so much for your time and help here.

rootze

bschilder commented 1 year ago

Hi @rootze , thanks for reporting this. It looks like echolocatoR (via the subpackage echoconda) is having trouble finding the conda executable. You can see the error message at the end of each locus report:

Specified conda binary '/data/homezvol0/zechuas/.cache/R/basilisk/1.8.1/0/bin/conda' does not exist.Locus MEX3C complete in: 1.06 min

conda should automatically be installed for you by echoconda if it hasn't been already, so I'm not sure why that's not happening. Could you please share your session info?

bschilder commented 1 year ago

I was trying to find some tutorials to follow for this newest version of echolocatoR, so I found this r-markdown. https://github.com/RajLabMSSM/echolocatoR/blob/master/vignettes/echolocatoR.Rmd

This is indeed the latest tutorial. The documentation website is rather behind as it's tied to the GitHub Actions workflow which is not completing atm, but I will try to get this updated soon.

rootze commented 1 year ago

@bschilder Thanks for helping out. Here is the sessionInfo(). It seems that my conda env echolocator has [29] echoconda_0.99.7.

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Rocky Linux 8.6 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /data/homezvol0/zechuas/.conda/envs/echolocator/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] echolocatoR_2.0.1

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                  reticulate_1.26
  [3] R.utils_2.12.0              tidyselect_1.2.0
  [5] RSQLite_2.2.18              AnnotationDbi_1.58.0
  [7] htmlwidgets_1.5.4           grid_4.2.1
  [9] BiocParallel_1.30.4         XGR_1.1.8
 [11] munsell_0.5.0               codetools_0.2-18
 [13] interp_1.1-3                DT_0.25
 [15] withr_2.5.0                 colorspace_2.0-3
 [17] OrganismDbi_1.38.1          Biobase_2.56.0
 [19] filelock_1.0.2              knitr_1.40
 [21] supraHex_1.34.0             rstudioapi_0.14
 [23] stats4_4.2.1                DescTools_0.99.46
 [25] MatrixGenerics_1.8.1        GenomeInfoDbData_1.2.8
 [27] mixsqp_0.3-43               bit64_4.0.5
 [29] echoconda_0.99.7            basilisk_1.8.1
 [31] vctrs_0.5.0                 generics_0.1.3
 [33] xfun_0.33                   biovizBase_1.44.0
 [35] BiocFileCache_2.4.0         R6_2.5.1
 [37] GenomeInfoDb_1.32.4         AnnotationFilter_1.20.0
 [39] bitops_1.0-7                cachem_1.0.6
 [41] reshape_0.8.9               DelayedArray_0.22.0
 [43] assertthat_0.2.1            BiocIO_1.6.0
 [45] scales_1.2.1                nnet_7.3-18
 [47] rootSolve_1.8.2.3           gtable_0.3.1
 [49] lmom_2.9                    ggbio_1.44.1
 [51] ensembldb_2.20.2            rlang_1.0.6
 [53] MungeSumstats_1.4.5         echodata_0.99.14
 [55] splines_4.2.1               lazyeval_0.2.2
 [57] rtracklayer_1.56.1          gargle_1.2.1
 [59] dichromat_2.0-0.1           hexbin_1.28.2
 [61] checkmate_2.1.0             BiocManager_1.30.18
 [63] yaml_2.3.6                  reshape2_1.4.4
 [65] snpStats_1.46.0             backports_1.4.1
 [67] GenomicFeatures_1.48.4      ggnetwork_0.5.10
 [69] Hmisc_4.7-1                 RBGL_1.72.0
 [71] tools_4.2.1                 echoplot_0.99.5
 [73] ggplot2_3.3.6               ellipsis_0.3.2
 [75] catalogueR_1.0.0            RColorBrewer_1.1-3
 [77] proxy_0.4-27                BiocGenerics_0.42.0
 [79] coloc_5.1.0.1               Rcpp_1.0.9
 [81] plyr_1.8.7                  base64enc_0.1-3
 [83] progress_1.2.2              zlibbioc_1.42.0
 [85] purrr_0.3.5                 RCurl_1.98-1.9
 [87] basilisk.utils_1.8.0        prettyunits_1.1.1
 [89] rpart_4.1.16                deldir_1.0-6
 [91] viridis_0.6.2               S4Vectors_0.34.0
 [93] SummarizedExperiment_1.26.1 ggrepel_0.9.1
 [95] cluster_2.1.4               fs_1.5.2
 [97] crul_1.3                    magrittr_2.0.3
 [99] data.table_1.14.4           echotabix_0.99.8
[101] dnet_1.1.7                  openxlsx_4.2.5
[103] mvtnorm_1.1-3               ProtGenerics_1.28.0
[105] matrixStats_0.62.0          hms_1.1.2
[107] patchwork_1.1.2             XML_3.99-0.11
[109] jpeg_0.1-9                  readxl_1.4.1
[111] IRanges_2.30.1              gridExtra_2.3
[113] compiler_4.2.1              biomaRt_2.52.0
[115] tibble_3.1.8                crayon_1.5.2
[117] R.oo_1.25.0                 htmltools_0.5.3
[119] echoannot_0.99.7            tzdb_0.3.0
[121] Formula_1.2-4               tidyr_1.2.1
[123] expm_0.999-6                Exact_3.2
[125] DBI_1.1.3                   dbplyr_2.2.1
[127] MASS_7.3-58.1               rappdirs_0.3.3
[129] boot_1.3-28                 Matrix_1.5-1
[131] readr_2.1.3                 piggyback_0.1.4
[133] cli_3.4.1                   R.methodsS3_1.8.2
[135] echofinemap_0.99.3          parallel_4.2.1
[137] igraph_1.3.5                GenomicRanges_1.48.0
[139] pkgconfig_2.0.3             GenomicAlignments_1.32.1
[141] dir.expiry_1.4.0            RCircos_1.2.2
[143] foreign_0.8-83              osfr_0.2.9
[145] xml2_1.3.3                  XVector_0.36.0
[147] echoLD_0.99.7               stringr_1.4.1
[149] VariantAnnotation_1.42.1    digest_0.6.30
[151] graph_1.74.0                httpcode_0.3.0
[153] Biostrings_2.64.1           cellranger_1.1.0
[155] htmlTable_2.4.1             gld_2.6.5
[157] restfulr_0.0.15             curl_4.3.3
[159] Rsamtools_2.12.0            rjson_0.2.21
[161] lifecycle_1.0.3             nlme_3.1-160
[163] jsonlite_1.8.2              viridisLite_0.4.1
[165] BSgenome_1.64.0             fansi_1.0.3
[167] downloadR_0.99.4            pillar_1.8.1
[169] susieR_0.12.27              lattice_0.20-45
[171] GGally_2.1.2                googleAuthR_2.0.0
[173] KEGGREST_1.36.3             fastmap_1.1.0
[175] httr_1.4.4                  survival_3.4-0
[177] glue_1.6.2                  zip_2.2.1
[179] png_0.1-7                   bit_4.0.4
[181] Rgraphviz_2.40.0            class_7.3-20
[183] stringi_1.7.8               blob_1.2.3
[185] latticeExtra_0.6-30         memoise_2.0.1
[187] dplyr_1.0.10                irlba_2.3.5.1
[189] e1071_1.7-11                ape_5.6-2
bschilder commented 1 year ago

Ok, so I'm noticing the basilisk version is a bit older (1.8.1 vs. 1.9.12): https://github.com/LTLA/basilisk/blob/master/DESCRIPTION

I've encountered a couple of issues with the earlier versions of basilisk, but not sure if it's related to your issue as well.

I also just noticed that your Bioc version is a bit outdated (latest release is 3.16):

 The selected method (rsamtools) depends on Rhtslib, and the version you have installed (1.28.0) contains known bugs. Please install Rhtslib >=1.99.2 via Bioconductor >=3.16:
    BiocManager::install(version='devel')
2: In system(paste(act.cmd, collapse = " "), intern = TRUE) :

So I think the best thing to do would be upgrade to Bioc 3.16, which should automatically upgrade your basilisk version as well.

BiocManager::install(version = "3.16")
bschilder commented 1 year ago

In fact, I think upgrading to Bioc 3.16 will circumvent your need for conda at all, since echotabix adaptively chooses the best available method given your R package versions. The "ramstools" method (which is R native) tends to be much more robust than "conda" method (which is partly a command line wrapper for conda). Not that you need to worry about those details, this is all handled automatically under the hood.

bschilder commented 1 year ago

Just noting that this might also relate to this convo: https://github.com/LTLA/basilisk/issues/16

rootze commented 1 year ago

@bschilder Updating BiocManager and basilisk solved the empty result problem of echolocatoR::finemap_loci. Thank you so much!

@bschilder Just a minor request for the tutorial when you have time to update it. I also saw your closed issue posted under PAINTOR. It would be great to know how we can perform the fine mapping with annotation in echolocatoR. I am currently reading your function reference page and source code to figure it out. But there are quite some parameters that we can explore. For example, I would like to fine-mapping with annotation .bed files, as shown in https://github.com/gkichaev/PAINTOR_V3.0/wiki/2b.-Overlapping-annotations So it would be great to have an example from you.

bschilder commented 1 year ago

@rootze no problem, glad it worked!

Regarding PAINTOR, I'm actually in the midst of trying to use those exact annotations provided by the PAINTOR team. The problem, it's a massive set of files that would ideally by queryable for relevant subsets. Once I get that down, I'll work on getting custom user-supplied annotations as an option as well (a feature I'm also working on for PolyFun atm). You can follow the progress on this front here: https://github.com/RajLabMSSM/echofinemap/issues/14

bschilder commented 1 year ago

Also, just a side note, the empty results can occur any time there is an error inside the multi-locus iterator bc I have a tryCatch built-in. This allows users to keep fine-mapping lots of loci even when a couple of them fail for whatever unforeseen reason. But to turn this off and get a better look at the trackback messages, set:

finemap_loci(use_tryCatch = FALSE, ...)