RajLabMSSM / echolocatoR

Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.
https://rajlabmssm.github.io/echolocatoR
MIT License
30 stars 11 forks source link

Problem with "N" using munged data #114

Closed AMCalejandro closed 1 year ago

AMCalejandro commented 1 year ago

1. Bug description

This is the continuation of a bug description in #113. I tried to pass to finemap_loci, the munged data, and the issue with sample size keeps arising

When I pass an integer to compute_n, it sort of works assigning the same N to all SNPs. However, as you will see below, NSTUDY exists.......

2. Reproducible example

Note Reading compute_n in mungerSumStats doc, I tried....

finemap_loci(# GENERAL ARGUMENTS 
                                          topSNPs = topSNPs,
                                          results_dir = fullRS_path,
                                          loci = topSNPs$Locus,
                                          dataset_name = "LID_COX",
                                          dataset_type = "GWAS",  
                                          force_new_subset = TRUE,
                                          force_new_LD = FALSE,
                                          force_new_finemap = TRUE,
                                          remove_tmps = FALSE,

                                          finemap_methods = c("ABF","FINEMAP","SUSIE", "POLYFUN_SUSIE"),

                                          # Munge full sumstats first
                                          munged = TRUE,

                                          # SUMMARY STATS ARGUMENTS
                                          fullSS_path = newSS_name_colmap,
                                          fullSS_genome_build = "hg19",
                                          query_by ="tabix",

                                          compute_n = 3500,

                                          bp_distance = 10000,#500000*2,
                                          min_MAF = 0.001, 
                                          trim_gene_limits = FALSE,

                                          case_control = FALSE,

                                          # FINE-MAPPING ARGUMENTS
                                          ## General
                                          n_causal = 5,
                                          credset_thresh = .95,
                                          consensus_thresh = 2,

                                          # LD ARGUMENTS 
                                          LD_reference = "1KGphase3",#"UKB",
                                          superpopulation = "EUR",
                                          download_method = "axel",
                                          LD_genome_build = "hg19",
                                          leadSNP_LD_block = FALSE,

                                          #### PLotting args ####
                                          plot_types = c("simple"),
                                          show_plot = TRUE,
                                          zoom = "1x",
                                          tx_biotypes = NULL,
                                          nott_epigenome = FALSE,
                                          nott_show_placseq = FALSE,
                                          nott_binwidth = 200,
                                          nott_bigwig_dir = NULL,
                                          xgr_libnames = NULL,
                                          roadmap = FALSE,
                                          roadmap_query = NULL,

                                          #### General args ####
                                          seed = 2022,
                                          nThread = 20,
                                          verbose = TRUE
                                          )

Console output

yFun submodule already installed.
┌─────────────────────────────────────────────────┐
│                                                 │
│   )))> 🦇 RP11-240A16.1 [locus 1 / 3] 🦇 <(((   │
│                                                 │
└─────────────────────────────────────────────────┘

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 1 ▶▶▶ Query 🔎 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Assuming fullSS_path summary stats have already been processed with MungeSumstats.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with zgrep.
( zgrep ^'SNP' .../QC_MunGeSumStats.tsv.gz; zgrep
    -v ^'SNP' .../QC_MunGeSumStats.tsv.gz | sort
    -k2,2n
    -k3,3n ) > .../file3b8c2ab47a52_sorted.tsv
Constructing outputs
Using existing bgzipped file: /home/rstudio/echolocatoR/echolocatoR_LID/QC_MunGeSumStats.tsv.bgz 
Set force_new=TRUE to override this.
Tabix-indexing file using: Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 4:32425284-32445284
Adding 'query' column to results.
Retrieved data with 76 rows
Saving query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/RP11-240A16.1/RP11-240A16.1_LID_COX_subset.tsv.gz
+ Query: 76 SNPs x 20 columns.
Standardizing summary statistics subset.
Standardizing main column names.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols.
++ Could not infer MAF.
++ Preparing N_cases,N_controls cols.
++ Preparing proportion_cases col.
++ proportion_cases not included in data subset.
Preparing sample size column (N).
Warning: When method is an integer, must be >0.
+ Mapping colnames from MungeSumstats ==> echolocatoR
+ Imputing t-statistic from Effect and StdErr.
+ leadSNP missing. Assigning new one by min p-value.
++ Ensuring Effect,StdErr,P are numeric.
++ Ensuring 1 SNP per row and per genomic coordinate.
++ Removing extra whitespace
+ Standardized query: 76 SNPs x 22 columns.
++ Saving standardized query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/RP11-240A16.1/RP11-240A16.1_LID_COX_subset.tsv.gz

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Previously computed LD_matrix detected. Importing: /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/RP11-240A16.1/LD/RP11-240A16.1.1KGphase3_LD.RDS
LD_reference identified as: r.
Converting obj to sparseMatrix.
+ FILTER:: Filtering by LD features.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 3 ▶▶▶ Filter SNPs 🚰 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
FILTER:: Filtering by SNP features.
+ FILTER:: Post-filtered data: 76 x 22
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 76 SNPs.
+ dat = 76 SNPs.
+ 76 SNPs in common.
Converting obj to sparseMatrix.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 4 ▶▶▶ Fine-map 🔊 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Gathering method sources.
Gathering method citations.
Preparing sample size column (N).
Warning: When method is an integer, must be >0.
+ Mapping colnames from MungeSumstats ==> echolocatoR
Gathering method sources.
Gathering method citations.
Gathering method sources.
Gathering method citations.
ABF
🚫 Missing required column(s) for ABF [skipping]: N, MAF, proportion_cases
FINEMAP
✅ All required columns present.
⚠ Missing optional column(s) for FINEMAP: MAF, N
SUSIE
✅ All required columns present.
⚠ Missing optional column(s) for SUSIE: N
POLYFUN_SUSIE
✅ All required columns present.
⚠ Missing optional column(s) for POLYFUN_SUSIE: MAF, N
++ Fine-mapping using 3 tool(s): FINEMAP, SUSIE, POLYFUN_SUSIE

+++ Multi-finemap:: FINEMAP +++
Preparing sample size column (N).
Warning: When method is an integer, must be >0.
+ Mapping colnames from MungeSumstats ==> echolocatoR
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 76 SNPs.
+ dat = 76 SNPs.
+ 76 SNPs in common.
Converting obj to sparseMatrix.
Constructing master file.
Optional MAF col missing. Replacing with all '.1's
Constructing data.z file.
Constructing data.ld file.
FINEMAP path: /home/rstudio/.cache/R/echofinemap/FINEMAP/finemap_v1.4.1_x86_64/finemap_v1.4.1_x86_64
Inferred FINEMAP version: 1.4.1
Running FINEMAP.
cd .../RP11-240A16.1 &&
    .../finemap_v1.4.1_x86_64

    --sss

    --in-files .../master

    --log

    --n-threads 20

    --n-causal-snps 5
Error : Master file '/home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/RP11-240A16.1/FINEMAP/master' is missing an entry in line 2 column 'n_samples'!

|--------------------------------------|
| Welcome to FINEMAP v1.4.1            |
|                                      |
| (c) 2015-2022 University of Helsinki |
|                                      |
| Help :                               |
| - ./finemap --help                   |
| - www.finemap.me                     |
| - www.christianbenner.com            |
|                                      |
| Contact :                            |
| - finemap@christianbenner.com        |
| - matti.pirinen@helsinki.fi          |
|--------------------------------------|

--------
SETTINGS
--------
- dataset            : all
- corr-config        : 0.95
- n-causal-snps      : 5
- n-configs-top      : 50000
- n-conv-sss         : 100
- n-iter             : 100000
- n-threads          : 20
- prior-k0           : 0
- prior-std          : 0.05 
- prob-conv-sss-tol  : 0.001
- prob-cred-set      : 0.95

+++ Multi-finemap:: SUSIE +++
Loading required namespace: Rfast
Failed with error:  'there is no package called 'Rfast''
Warning in SUSIE(dat = dat, dataset_type = dataset_type, LD_matrix = LD_matrix,  :
  Install Rfast to speed up susieR even further:
   install.packages('Rfast')
Preparing sample size column (N).
Warning: When method is an integer, must be >0.
+ Mapping colnames from MungeSumstats ==> echolocatoR
sample_size=NULL: must be valid integer.Locus RP11-240A16.1 complete in: 0.71 min

...
...

 Step 6 ▶▶▶ Postprocess data 🎁 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Returning results as nested list.
All loci done in: 2.14 min
$`RP11-240A16.1`
NULL

$XYLT1
NULL

$LRP8
NULL

$merged_dat

Data

Input data stored on tsv.gz file

> head(data_munged)
           SNP CHR     BP A1 A2       ID    FRQ  FRQSE FRQMIN FRQMAX    BETA     SE      P DIRECTION HETISQT HETCHISQ HETDF HETPVAL NSTUDY MAF_VARIABILITY
1:  rs58276399   1 731718  T  C 1:731718 0.8837 0.0028 0.8800 0.8950 -0.1775 0.1583 0.2621     ?---+       0    0.040     3  0.9979   1297          0.0150
2: rs141242758   1 734349  T  C 1:734349 0.8843 0.0025 0.8800 0.8950 -0.1577 0.1593 0.3223     ?---+       0    0.143     3  0.9862   1297          0.0150
3:   rs2073813   1 753541  G  A 1:753541 0.1257 0.0013 0.1050 0.1283 -0.0721 0.1177 0.5399     ++++-       0    0.220     4  0.9944   2687          0.0233
4:  rs61768174   1 766007  A  C 1:766007 0.9005 0.0024 0.8967 0.9076 -0.2559 0.1642 0.1190     ?---+       0    0.065     3  0.9957   1297          0.0109
5:  rs60320384   1 769223  C  G 1:769223 0.8749 0.0017 0.8728 0.8950 -0.0772 0.1178 0.5124     ----+       0    0.333     4  0.9876   2687          0.0222
6:  rs59066358   1 771967  G  A 1:771967 0.1255 0.0014 0.1050 0.1272 -0.0776 0.1177 0.5095     ++++-       0    0.357     4  0.9859   2687          0.0222

3. Session info

(Add output of the R function utils::sessionInfo() below. This helps us assess version/OS conflicts which could be causing bugs.)

``` > sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22 SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20 BSgenome_1.65.2 rtracklayer_1.57.0 [5] Biostrings_2.65.3 XVector_0.37.1 GenomicRanges_1.49.1 GenomeInfoDb_1.33.5 [9] IRanges_2.31.2 S4Vectors_0.35.3 BiocGenerics_0.43.1 forcats_0.5.2 [13] stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4 readr_2.1.2 [17] tidyr_1.2.0 tibble_3.1.8 ggplot2_3.3.6 tidyverse_1.3.2 [21] data.table_1.14.2 echolocatoR_2.0.1 loaded via a namespace (and not attached): [1] rappdirs_0.3.3 GGally_2.1.2 R.methodsS3_1.8.2 [4] echoLD_0.99.7 bit64_4.0.5 knitr_1.40 [7] irlba_2.3.5 DelayedArray_0.23.1 R.utils_2.12.0 [10] rpart_4.1.16 KEGGREST_1.37.3 RCurl_1.98-1.8 [13] AnnotationFilter_1.21.0 generics_0.1.3 GenomicFeatures_1.49.6 [16] RSQLite_2.2.16 proxy_0.4-27 bit_4.0.4 [19] tzdb_0.3.0 xml2_1.3.3 lubridate_1.8.0 [22] SummarizedExperiment_1.27.2 assertthat_0.2.1 viridis_0.6.2 [25] gargle_1.2.0 xfun_0.32 hms_1.1.2 [28] fansi_1.0.3 restfulr_0.0.15 progress_1.2.2 [31] dbplyr_2.2.1 readxl_1.4.1 Rgraphviz_2.41.1 [34] igraph_1.3.4 DBI_1.1.3 htmlwidgets_1.5.4 [37] reshape_0.8.9 downloadR_0.99.4 googledrive_2.0.0 [40] ellipsis_0.3.2 backports_1.4.1 biomaRt_2.53.2 [43] deldir_1.0-6 MatrixGenerics_1.9.1 MungeSumstats_1.5.13 [46] vctrs_0.4.1 Biobase_2.57.1 ensembldb_2.21.4 [49] cachem_1.0.6 withr_2.5.0 checkmate_2.1.0 [52] GenomicAlignments_1.33.1 prettyunits_1.1.1 cluster_2.1.3 [55] ape_5.6-2 dir.expiry_1.5.0 lazyeval_0.2.2 [58] crayon_1.5.1 basilisk.utils_1.9.2 crul_1.2.0 [61] pkgconfig_2.0.3 nlme_3.1-159 ProtGenerics_1.29.0 [64] XGR_1.1.8 nnet_7.3-17 rlang_1.0.5 [67] lifecycle_1.0.1 filelock_1.0.2 httpcode_0.3.0 [70] BiocFileCache_2.5.0 modelr_0.1.9 echotabix_0.99.8 [73] dichromat_2.0-0.1 cellranger_1.1.0 coloc_5.1.0 [76] matrixStats_0.62.0 graph_1.75.0 Matrix_1.4-1 [79] osfr_0.2.8 boot_1.3-28 reprex_2.0.2 [82] base64enc_0.1-3 googlesheets4_1.0.1 png_0.1-7 [85] viridisLite_0.4.1 rjson_0.2.21 rootSolve_1.8.2.3 [88] bitops_1.0-7 R.oo_1.25.0 ggnetwork_0.5.10 [91] blob_1.2.3 mixsqp_0.3-43 echoplot_0.99.5 [94] dnet_1.1.7 jpeg_0.1-9 BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1 [97] echodata_0.99.12 scales_1.2.1 memoise_2.0.1 [100] magrittr_2.0.3 plyr_1.8.7 hexbin_1.28.2 [103] zlibbioc_1.43.0 compiler_4.2.0 echoconda_0.99.7 [106] BiocIO_1.7.1 RColorBrewer_1.1-3 catalogueR_1.0.0 [109] Rsamtools_2.13.4 cli_3.3.0 echoannot_0.99.7 [112] patchwork_1.1.2 htmlTable_2.4.1 Formula_1.2-4 [115] MASS_7.3-58.1 tidyselect_1.1.2 stringi_1.7.8 [118] yaml_2.3.5 supraHex_1.35.0 latticeExtra_0.6-30 [121] ggrepel_0.9.1 grid_4.2.0 VariantAnnotation_1.43.3 [124] tools_4.2.0 lmom_2.9 parallel_4.2.0 [127] rstudioapi_0.14 foreign_0.8-82 piggyback_0.1.3 [130] gridExtra_2.3 gld_2.6.5 digest_0.6.29 [133] snpStats_1.47.1 BiocManager_1.30.18 Rcpp_1.0.9 [136] broom_1.0.1 OrganismDbi_1.39.1 httr_1.4.4 [139] AnnotationDbi_1.59.1 RCircos_1.2.2 ggbio_1.45.0 [142] biovizBase_1.45.0 colorspace_2.0-3 rvest_1.0.3 [145] XML_3.99-0.10 fs_1.5.2 reticulate_1.26 [148] splines_4.2.0 RBGL_1.73.0 expm_0.999-6 [151] echofinemap_0.99.3 basilisk_1.9.2 Exact_3.1 [154] jsonlite_1.8.0 susieR_0.12.27 R6_2.5.1 [157] Hmisc_4.7-1 pillar_1.8.1 htmltools_0.5.3 [160] glue_1.6.2 fastmap_1.1.0 DT_0.24 [163] BiocParallel_1.31.12 class_7.3-20 codetools_0.2-18 [166] mvtnorm_1.1-3 utf8_1.2.2 lattice_0.20-45 [169] curl_4.3.2 DescTools_0.99.46 zip_2.2.0 [172] openxlsx_4.2.5 interp_1.1-3 survival_3.3-1 [175] googleAuthR_2.0.0 munsell_0.5.0 e1071_1.7-11 [178] GenomeInfoDbData_1.2.8 haven_2.5.1 reshape2_1.4.4 [181] gtable_0.3.1 ```
bschilder commented 1 year ago

1. "N"

1.a. compute_n = "ldsc" (I get sample_size=NULL: must be valid integer)

This was a bug due to me missing some places where the sample_size was still being used. I just pushed a change to contruct_colmap so that is takes the arg N= instead of sample_size. N acts just like the other mapping columns in that it renames a column based on the value supplied to the argument (construct_colmap(N="N_cases") means that the "N_cases" column will get renamed to "N" and used in any subsequent steps that require total (or effective) sample size.

Note that whenever the "N" col is present in the post-standardized sumstats data, this will be used instead of whatever you supply to compute_n. I've updated the docs to better explain this.

Screenshot 2022-09-20 at 13 03 12

1.b. compute_n = data_munged$NSTUDY (This does not work at all)

This is my bad; I told you yesterday that MungeSumstats::compute_nsize(compute_n=) can take a vector of sample size. But apparently I misremembered and it can only take a single number to be applied to all rows. Instead, I'll add a line to echodata::get_sample_size that handle these scenarios:

### Numeric vector 
    if(is.numeric(compute_n) &&
       length(compute_n)>1){
        messager("Numeric vector supplied to compute_n.",v=verbose)
        if(length(compute_n)!=nrow(dat2)){
            stp <- paste(
                "When compute_n is a numeric vector,",
                "its length must be exactly equal to the number of rows",
                "in your summary statistics data (dat)."
            )
            stop(stp)
        } else {
            dat2$N <- compute_n
            return(dat2)
        }
    }

2. Rfast

Rfast also wasn't installed. I've now ensured that it get automatically installed by making it an Import of echofinemap.

AMCalejandro commented 1 year ago

1.b. compute_n = data_munged$NSTUDY (This does not work at all)

Just so you know, I got to read that the argument could take a vector in Alan's documentation.

image

I am assuming compute_n is used within MungeSumStats and not within echolocatoR workflow?

bschilder commented 1 year ago

The docs there say that you can supply an single integer (not a vector) which is applied to all rows. The only part where it mentions a vector is with the character arguments, in which case it just takes the first element of the character vector, which indicates the strategy you'd like to use for compute (effective) sample size from other columns.

bschilder commented 1 year ago

I am assuming compute_n is used within MungeSumStats and not within echolocatoR workflow?

It's used in both. The difference is that echodata::get_sample_size wraps MungeSumstats::compute_nsize and has some additional input handling strategies including (now) handling integer vectors supplied to compute_n. https://github.com/RajLabMSSM/echodata/blob/main/R/get_sample_size.R

AMCalejandro commented 1 year ago

I just got this running

Note that:

Code

columnsnames = echodata::construct_colmap(munged= FALSE,
                                          CHR = "CHR", POS = "POS",
                                          SNP = "SNP", P = "P",
                                          Effect = "BETA", StdErr = "SE", 
                                          A1 = "A1", A2 = "A2", Freq = "FREQ",
                                          N = "N")
                                          #N_cases = NULL, N_controls = NULL,
                                          #proportion_cases = NULL,
                                          #MAF = "calculate",
                                          #tstat = NULL)

# Pass the sample size as "N" column
# compute_n will do all what is in the docu f N does not exist

finemap_loci(# GENERAL ARGUMENTS 
                                          topSNPs = topSNPs,
                                          results_dir = fullRS_path,
                                          loci = topSNPs$Locus,
                                          dataset_name = "LID_COX",
                                          dataset_type = "GWAS",  
                                          force_new_subset = TRUE,
                                          force_new_LD = FALSE,
                                          force_new_finemap = TRUE,
                                          remove_tmps = FALSE,

                                          finemap_methods = c("ABF","FINEMAP","SUSIE", "POLYFUN_SUSIE"),

                                          # Munge full sumstats first
                                          munged = FALSE,
                                          colmap = columnsnames,
                                          # SUMMARY STATS ARGUMENTS
                                          fullSS_path = newSS_name_colmap,
                                          fullSS_genome_build = "hg19",
                                          query_by ="tabix",

                                          #compute_n = 3500,

                                          bp_distance = 10000,#500000*2,
                                          min_MAF = 0.001, 
                                          trim_gene_limits = FALSE,

                                          case_control = FALSE,

                                          # FINE-MAPPING ARGUMENTS
                                          ## General
                                          n_causal = 5,
                                          credset_thresh = .95,
                                          consensus_thresh = 2,

                                          # LD ARGUMENTS 
                                          LD_reference = "1KGphase3",#"UKB",
                                          superpopulation = "EUR",
                                          download_method = "axel",
                                          LD_genome_build = "hg19",
                                          leadSNP_LD_block = FALSE,

                                          #### PLotting args ####
                                          plot_types = c("simple"),
                                          show_plot = TRUE,
                                          zoom = "1x",
                                          tx_biotypes = NULL,
                                          nott_epigenome = FALSE,
                                          nott_show_placseq = FALSE,
                                          nott_binwidth = 200,
                                          nott_bigwig_dir = NULL,
                                          xgr_libnames = NULL,
                                          roadmap = FALSE,
                                          roadmap_query = NULL,

                                          #### General args ####
                                          seed = 2022,
                                          nThread = 20,
                                          verbose = TRUE
                                          )

Output

PolyFun submodule already installed.
┌─────────────────────────────────────────────────┐
│                                                 │
│   )))> 🦇 RP11-240A16.1 [locus 1 / 3] 🦇 <(((   │
│                                                 │
└─────────────────────────────────────────────────┘

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 1 ▶▶▶ Query 🔎 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'SNP' .../QC_SNPs_COLMAP.txt; grep
    -v ^'SNP' .../QC_SNPs_COLMAP.txt | sort
    -k2,2n
    -k3,3n ) > .../file2fb2fcecd3b_sorted.tsv
Constructing outputs
Using existing bgzipped file: /home/rstudio/echolocatoR/echolocatoR_LID/QC_SNPs_COLMAP.txt.bgz 
Set force_new=TRUE to override this.
Tabix-indexing file using: Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 4:32425284-32445284
Adding 'query' column to results.
Retrieved data with 76 rows
Saving query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/RP11-240A16.1/RP11-240A16.1_LID_COX_subset.tsv.gz
+ Query: 76 SNPs x 10 columns.
Standardizing summary statistics subset.
Standardizing main column names.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols.
++ Could not infer MAF.
++ Preparing N_cases,N_controls cols.
++ Preparing proportion_cases col.
++ proportion_cases not included in data subset.
Preparing sample size column (N).
Using existing 'N' column.
+ Imputing t-statistic from Effect and StdErr.
+ leadSNP missing. Assigning new one by min p-value.
++ Ensuring Effect,StdErr,P are numeric.
++ Ensuring 1 SNP per row and per genomic coordinate.
++ Removing extra whitespace
+ Standardized query: 76 SNPs x 12 columns.
++ Saving standardized query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/RP11-240A16.1/RP11-240A16.1_LID_COX_subset.tsv.gz

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Previously computed LD_matrix detected. Importing: /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/RP11-240A16.1/LD/RP11-240A16.1.1KGphase3_LD.RDS
LD_reference identified as: r.
Converting obj to sparseMatrix.
+ FILTER:: Filtering by LD features.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 3 ▶▶▶ Filter SNPs 🚰 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
FILTER:: Filtering by SNP features.
+ FILTER:: Post-filtered data: 76 x 12
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 76 SNPs.
+ dat = 76 SNPs.
+ 76 SNPs in common.
Converting obj to sparseMatrix.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 4 ▶▶▶ Fine-map 🔊 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Gathering method sources.
Gathering method citations.
Preparing sample size column (N).
Using existing 'N' column.
Gathering method sources.
Gathering method citations.
Gathering method sources.
Gathering method citations.
ABF
🚫 Missing required column(s) for ABF [skipping]: MAF, proportion_cases
FINEMAP
✅ All required columns present.
⚠ Missing optional column(s) for FINEMAP: MAF
SUSIE
✅ All required columns present.
✅ All optional columns present.
POLYFUN_SUSIE
✅ All required columns present.
⚠ Missing optional column(s) for POLYFUN_SUSIE: MAF
++ Fine-mapping using 3 tool(s): FINEMAP, SUSIE, POLYFUN_SUSIE

+++ Multi-finemap:: FINEMAP +++
Preparing sample size column (N).
Using existing 'N' column.
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 76 SNPs.
+ dat = 76 SNPs.
+ 76 SNPs in common.
Converting obj to sparseMatrix.
Constructing master file.
Optional MAF col missing. Replacing with all '.1's
Constructing data.z file.
Constructing data.ld file.
FINEMAP path: /home/rstudio/.cache/R/echofinemap/FINEMAP/finemap_v1.4.1_x86_64/finemap_v1.4.1_x86_64
Inferred FINEMAP version: 1.4.1
Running FINEMAP.
cd .../RP11-240A16.1 &&
    .../finemap_v1.4.1_x86_64

    --sss

    --in-files .../master

    --log

    --n-threads 20

    --n-causal-snps 5

|--------------------------------------|
| Welcome to FINEMAP v1.4.1            |
|                                      |
| (c) 2015-2022 University of Helsinki |
|                                      |
| Help :                               |
| - ./finemap --help                   |
| - www.finemap.me                     |
| - www.christianbenner.com            |
|                                      |
| Contact :                            |
| - finemap@christianbenner.com        |
| - matti.pirinen@helsinki.fi          |
|--------------------------------------|

--------
SETTINGS
--------
- dataset            : all
- corr-config        : 0.95
- n-causal-snps      : 5
- n-configs-top      : 50000
- n-conv-sss         : 100
- n-iter             : 100000
- n-threads          : 20
- prior-k0           : 0
- prior-std          : 0.05 
- prob-conv-sss-tol  : 0.001
- prob-cred-set      : 0.95

------------
FINE-MAPPING (1/1)
------------
- GWAS summary stats               : FINEMAP/data.z
- SNP correlations                 : FINEMAP/data.ld
- Causal SNP stats                 : FINEMAP/data.snp
- Causal configurations            : FINEMAP/data.config
- Credible sets                    : FINEMAP/data.cred
- Log file                         : FINEMAP/data.log_sss
- Reading input                    : done!   

- Updated prior SD of effect sizes : 0.05 0.0528 0.0558 0.0589 

- Number of GWAS samples           : 2687
- Number of SNPs                   : 76
- Prior-Pr(# of causal SNPs is k)  : 
  (0 -> 0)
   1 -> 0.584
   2 -> 0.292
   3 -> 0.096
   4 -> 0.0234
   5 -> 0.00449
- 1800 configurations evaluated (0.122/100%) : converged after 122 iterations
- Computing causal SNP statistics  : done!   
- Regional SNP heritability        : 0.0276 (SD: 0.00441 ; 95% CI: [0.0196,0.0371])
- Log10-BF of >= one causal SNP    : 24.4
- Post-expected # of causal SNPs   : 4.74
- Post-Pr(# of causal SNPs is k)   : 
  (0 -> 0)
   1 -> 9.4e-21
   2 -> 2.73e-11
   3 -> 1.41e-07
   4 -> 0.265
   5 -> 0.735
- Writing output                   : done!   
- Run time                         : 0 hours, 0 minutes, 0 seconds
2 data.cred* file(s) found in the same subfolder.
Selected file based on postPr_k: data.cred5
Importing conditional probabilities (.cred file).
No configurations were causal at PP>=0.95.
Importing marginal probabilities (.snp file).
Importing configuration probabilities (.config file).
FINEMAP was unable to identify any credible sets at PP>=0.95.
++ Credible Set SNPs identified = 0
++ Merging FINEMAP results with multi-finemap data.

+++ Multi-finemap:: SUSIE +++
Loading required namespace: Rfast
Failed with error:  'there is no package called 'Rfast''
Preparing sample size column (N).
Using existing 'N' column.
+ SUSIE:: sample_size=2,687
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 76 SNPs.
+ dat = 76 SNPs.
+ 76 SNPs in common.
Converting obj to sparseMatrix.
+ SUSIE:: Using `susie_rss()` from susieR v0.12.27
+ SUSIE:: Extracting Credible Sets.
++ Credible Set SNPs identified = 2
++ Merging SUSIE results with multi-finemap data.

+++ Multi-finemap:: POLYFUN_SUSIE +++
PolyFun submodule already installed.
PolyFun:: Fine-mapping with method=SUSIE
PolyFun:: Using priors from mode=precomputed
Unable to find conda binary. Is Anaconda installed?Locus RP11-240A16.1 complete in: 0.33 min
┌─────────────────────────────────────────┐
│                                         │
│   )))> 🦇 XYLT1 [locus 2 / 3] 🦇 <(((   │
│                                         │
└─────────────────────────────────────────┘

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 1 ▶▶▶ Query 🔎 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'SNP' .../QC_SNPs_COLMAP.txt; grep
    -v ^'SNP' .../QC_SNPs_COLMAP.txt | sort
    -k2,2n
    -k3,3n ) > .../file2fb33669f7f_sorted.tsv
Constructing outputs
Using existing bgzipped file: /home/rstudio/echolocatoR/echolocatoR_LID/QC_SNPs_COLMAP.txt.bgz 
Set force_new=TRUE to override this.
Tabix-indexing file using: Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 16:17034975-17054975
Adding 'query' column to results.
Retrieved data with 80 rows
Saving query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/XYLT1/XYLT1_LID_COX_subset.tsv.gz
+ Query: 80 SNPs x 10 columns.
Standardizing summary statistics subset.
Standardizing main column names.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols.
++ Could not infer MAF.
++ Preparing N_cases,N_controls cols.
++ Preparing proportion_cases col.
++ proportion_cases not included in data subset.
Preparing sample size column (N).
Using existing 'N' column.
+ Imputing t-statistic from Effect and StdErr.
+ leadSNP missing. Assigning new one by min p-value.
++ Ensuring Effect,StdErr,P are numeric.
++ Ensuring 1 SNP per row and per genomic coordinate.
++ Removing extra whitespace
+ Standardized query: 80 SNPs x 12 columns.
++ Saving standardized query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/XYLT1/XYLT1_LID_COX_subset.tsv.gz

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Previously computed LD_matrix detected. Importing: /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/XYLT1/LD/XYLT1.1KGphase3_LD.RDS
LD_reference identified as: r.
Converting obj to sparseMatrix.
+ FILTER:: Filtering by LD features.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 3 ▶▶▶ Filter SNPs 🚰 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
FILTER:: Filtering by SNP features.
+ FILTER:: Post-filtered data: 78 x 12
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 78 SNPs.
+ dat = 78 SNPs.
+ 78 SNPs in common.
Converting obj to sparseMatrix.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 4 ▶▶▶ Fine-map 🔊 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Gathering method sources.
Gathering method citations.
Preparing sample size column (N).
Using existing 'N' column.
Gathering method sources.
Gathering method citations.
Gathering method sources.
Gathering method citations.
ABF
🚫 Missing required column(s) for ABF [skipping]: MAF, proportion_cases
FINEMAP
✅ All required columns present.
⚠ Missing optional column(s) for FINEMAP: MAF
SUSIE
✅ All required columns present.
✅ All optional columns present.
POLYFUN_SUSIE
✅ All required columns present.
⚠ Missing optional column(s) for POLYFUN_SUSIE: MAF
++ Fine-mapping using 3 tool(s): FINEMAP, SUSIE, POLYFUN_SUSIE

+++ Multi-finemap:: FINEMAP +++
Preparing sample size column (N).
Using existing 'N' column.
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 78 SNPs.
+ dat = 78 SNPs.
+ 78 SNPs in common.
Converting obj to sparseMatrix.
Constructing master file.
Optional MAF col missing. Replacing with all '.1's
Constructing data.z file.
Constructing data.ld file.
FINEMAP path: /home/rstudio/.cache/R/echofinemap/FINEMAP/finemap_v1.4.1_x86_64/finemap_v1.4.1_x86_64
Inferred FINEMAP version: 1.4.1
Running FINEMAP.
cd .../XYLT1 &&
    .../finemap_v1.4.1_x86_64

    --sss

    --in-files .../master

    --log

    --n-threads 20

    --n-causal-snps 5

|--------------------------------------|
| Welcome to FINEMAP v1.4.1            |
|                                      |
| (c) 2015-2022 University of Helsinki |
|                                      |
| Help :                               |
| - ./finemap --help                   |
| - www.finemap.me                     |
| - www.christianbenner.com            |
|                                      |
| Contact :                            |
| - finemap@christianbenner.com        |
| - matti.pirinen@helsinki.fi          |
|--------------------------------------|

--------
SETTINGS
--------
- dataset            : all
- corr-config        : 0.95
- n-causal-snps      : 5
- n-configs-top      : 50000
- n-conv-sss         : 100
- n-iter             : 100000
- n-threads          : 20
- prior-k0           : 0
- prior-std          : 0.05 
- prob-conv-sss-tol  : 0.001
- prob-cred-set      : 0.95

------------
FINE-MAPPING (1/1)
------------
- GWAS summary stats               : FINEMAP/data.z
- SNP correlations                 : FINEMAP/data.ld
- Causal SNP stats                 : FINEMAP/data.snp
- Causal configurations            : FINEMAP/data.config
- Credible sets                    : FINEMAP/data.cred
- Log file                         : FINEMAP/data.log_sss
- Reading input                    : done!   

- Updated prior SD of effect sizes : 0.05 0.0522 0.0545 0.0568 

- Number of GWAS samples           : 2687
- Number of SNPs                   : 78
- Prior-Pr(# of causal SNPs is k)  : 
  (0 -> 0)
   1 -> 0.584
   2 -> 0.292
   3 -> 0.0961
   4 -> 0.0234
   5 -> 0.0045
- 1077 configurations evaluated (0.198/100%) : converged after 198 iterations
- Computing causal SNP statistics  : done!   
- Regional SNP heritability        : 0.0119 (SD: 0.00385 ; 95% CI: [0.00536,0.0204])
- Log10-BF of >= one causal SNP    : 4.46
- Post-expected # of causal SNPs   : 1.96
- Post-Pr(# of causal SNPs is k)   : 
  (0 -> 0)
   1 -> 0.245
   2 -> 0.548
   3 -> 0.204
   4 -> 0.00238
   5 -> 0
- Writing output                   : done!   
- Run time                         : 0 hours, 0 minutes, 0 seconds
3 data.cred* file(s) found in the same subfolder.
Selected file based on postPr_k: data.cred2
Importing conditional probabilities (.cred file).
No configurations were causal at PP>=0.95.
Importing marginal probabilities (.snp file).
Importing configuration probabilities (.config file).
FINEMAP was unable to identify any credible sets at PP>=0.95.
++ Credible Set SNPs identified = 0
++ Merging FINEMAP results with multi-finemap data.

+++ Multi-finemap:: SUSIE +++
Loading required namespace: Rfast
Failed with error:  'there is no package called 'Rfast''
In addition: Warning messages:
1: In SUSIE(dat = dat, dataset_type = dataset_type, LD_matrix = LD_matrix,  :
  Install Rfast to speed up susieR even further:
   install.packages('Rfast')
2: In susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) *  :
  IBSS algorithm did not converge in 100 iterations!
                  Please check consistency between summary statistics and LD matrix.
                  See https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html
Preparing sample size column (N).
Using existing 'N' column.
+ SUSIE:: sample_size=2,687
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 78 SNPs.
+ dat = 78 SNPs.
+ 78 SNPs in common.
Converting obj to sparseMatrix.
+ SUSIE:: Using `susie_rss()` from susieR v0.12.27
+ SUSIE:: Extracting Credible Sets.
++ Credible Set SNPs identified = 1
++ Merging SUSIE results with multi-finemap data.

+++ Multi-finemap:: POLYFUN_SUSIE +++
PolyFun submodule already installed.
PolyFun:: Fine-mapping with method=SUSIE
PolyFun:: Using priors from mode=precomputed
Unable to find conda binary. Is Anaconda installed?Locus XYLT1 complete in: 0.32 min
┌────────────────────────────────────────┐
│                                        │
│   )))> 🦇 LRP8 [locus 3 / 3] 🦇 <(((   │
│                                        │
└────────────────────────────────────────┘

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 1 ▶▶▶ Query 🔎 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'SNP' .../QC_SNPs_COLMAP.txt; grep
    -v ^'SNP' .../QC_SNPs_COLMAP.txt | sort
    -k2,2n
    -k3,3n ) > .../file2fb4113b218_sorted.tsv
Constructing outputs
Using existing bgzipped file: /home/rstudio/echolocatoR/echolocatoR_LID/QC_SNPs_COLMAP.txt.bgz 
Set force_new=TRUE to override this.
Tabix-indexing file using: Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 1:53768300-53788300
Adding 'query' column to results.
Retrieved data with 52 rows
Saving query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/LRP8/LRP8_LID_COX_subset.tsv.gz
+ Query: 52 SNPs x 10 columns.
Standardizing summary statistics subset.
Standardizing main column names.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols.
++ Could not infer MAF.
++ Preparing N_cases,N_controls cols.
++ Preparing proportion_cases col.
++ proportion_cases not included in data subset.
Preparing sample size column (N).
Using existing 'N' column.
+ Imputing t-statistic from Effect and StdErr.
+ leadSNP missing. Assigning new one by min p-value.
++ Ensuring Effect,StdErr,P are numeric.
++ Ensuring 1 SNP per row and per genomic coordinate.
++ Removing extra whitespace
+ Standardized query: 52 SNPs x 12 columns.
++ Saving standardized query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/LRP8/LRP8_LID_COX_subset.tsv.gz

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Previously computed LD_matrix detected. Importing: /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/LRP8/LD/LRP8.1KGphase3_LD.RDS
LD_reference identified as: r.
Converting obj to sparseMatrix.
+ FILTER:: Filtering by LD features.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 3 ▶▶▶ Filter SNPs 🚰 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
FILTER:: Filtering by SNP features.
+ FILTER:: Post-filtered data: 51 x 12
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 51 SNPs.
+ dat = 51 SNPs.
+ 51 SNPs in common.
Converting obj to sparseMatrix.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 4 ▶▶▶ Fine-map 🔊 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Gathering method sources.
Gathering method citations.
Preparing sample size column (N).
Using existing 'N' column.
Gathering method sources.
Gathering method citations.
Gathering method sources.
Gathering method citations.
ABF
🚫 Missing required column(s) for ABF [skipping]: MAF, proportion_cases
FINEMAP
✅ All required columns present.
⚠ Missing optional column(s) for FINEMAP: MAF
SUSIE
✅ All required columns present.
✅ All optional columns present.
POLYFUN_SUSIE
✅ All required columns present.
⚠ Missing optional column(s) for POLYFUN_SUSIE: MAF
++ Fine-mapping using 3 tool(s): FINEMAP, SUSIE, POLYFUN_SUSIE

+++ Multi-finemap:: FINEMAP +++
Preparing sample size column (N).
Using existing 'N' column.
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 51 SNPs.
+ dat = 51 SNPs.
+ 51 SNPs in common.
Converting obj to sparseMatrix.
Constructing master file.
Optional MAF col missing. Replacing with all '.1's
Constructing data.z file.
Constructing data.ld file.
FINEMAP path: /home/rstudio/.cache/R/echofinemap/FINEMAP/finemap_v1.4.1_x86_64/finemap_v1.4.1_x86_64
Inferred FINEMAP version: 1.4.1
Running FINEMAP.
cd .../LRP8 &&
    .../finemap_v1.4.1_x86_64

    --sss

    --in-files .../master

    --log

    --n-threads 20

    --n-causal-snps 5

|--------------------------------------|
| Welcome to FINEMAP v1.4.1            |
|                                      |
| (c) 2015-2022 University of Helsinki |
|                                      |
| Help :                               |
| - ./finemap --help                   |
| - www.finemap.me                     |
| - www.christianbenner.com            |
|                                      |
| Contact :                            |
| - finemap@christianbenner.com        |
| - matti.pirinen@helsinki.fi          |
|--------------------------------------|

--------
SETTINGS
--------
- dataset            : all
- corr-config        : 0.95
- n-causal-snps      : 5
- n-configs-top      : 50000
- n-conv-sss         : 100
- n-iter             : 100000
- n-threads          : 20
- prior-k0           : 0
- prior-std          : 0.05 
- prob-conv-sss-tol  : 0.001
- prob-cred-set      : 0.95

------------
FINE-MAPPING (1/1)
------------
- GWAS summary stats               : FINEMAP/data.z
- SNP correlations                 : FINEMAP/data.ld
- Causal SNP stats                 : FINEMAP/data.snp
- Causal configurations            : FINEMAP/data.config
- Credible sets                    : FINEMAP/data.cred
- Log file                         : FINEMAP/data.log_sss
- Reading input                    : done!   

- Updated prior SD of effect sizes : 0.05 0.0517 0.0535 0.0554 

- Number of GWAS samples           : 2687
- Number of SNPs                   : 51
- Prior-Pr(# of causal SNPs is k)  : 
  (0 -> 0)
   1 -> 0.585
   2 -> 0.292
   3 -> 0.0955
   4 -> 0.0229
   5 -> 0.00431
- 1081 configurations evaluated (0.123/100%) : converged after 123 iterations
- Computing causal SNP statistics  : done!   
- Regional SNP heritability        : 0.0259 (SD: 0.00368 ; 95% CI: [0.0188,0.0334])
- Log10-BF of >= one causal SNP    : 24.9
- Post-expected # of causal SNPs   : 5
- Post-Pr(# of causal SNPs is k)   : 
  (0 -> 0)
   1 -> 5.84e-22
   2 -> 1.71e-17
   3 -> 1.74e-11
   4 -> 4.56e-06
   5 -> 1
- Writing output                   : done!   
- Run time                         : 0 hours, 0 minutes, 0 seconds
1 data.cred* file(s) found in the same subfolder.
Selected file based on postPr_k: data.cred5
Importing conditional probabilities (.cred file).
No configurations were causal at PP>=0.95.
Importing marginal probabilities (.snp file).
Importing configuration probabilities (.config file).
FINEMAP was unable to identify any credible sets at PP>=0.95.
++ Credible Set SNPs identified = 0
++ Merging FINEMAP results with multi-finemap data.

+++ Multi-finemap:: SUSIE +++
Loading required namespace: Rfast
Failed with error:  'there is no package called 'Rfast''
In addition: Warning message:
In SUSIE(dat = dat, dataset_type = dataset_type, LD_matrix = LD_matrix,  :
  Install Rfast to speed up susieR even further:
   install.packages('Rfast')
Preparing sample size column (N).
Using existing 'N' column.
+ SUSIE:: sample_size=2,687
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 51 SNPs.
+ dat = 51 SNPs.
+ 51 SNPs in common.
Converting obj to sparseMatrix.
+ SUSIE:: Using `susie_rss()` from susieR v0.12.27
+ SUSIE:: Extracting Credible Sets.
++ Credible Set SNPs identified = 3
++ Merging SUSIE results with multi-finemap data.

+++ Multi-finemap:: POLYFUN_SUSIE +++
PolyFun submodule already installed.
PolyFun:: Fine-mapping with method=SUSIE
PolyFun:: Using priors from mode=precomputed
Unable to find conda binary. Is Anaconda installed?Locus LRP8 complete in: 0.33 min

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 6 ▶▶▶ Postprocess data 🎁 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Returning results as nested list.
All loci done in: 0.97 min
$`RP11-240A16.1`
NULL

$XYLT1
NULL

$LRP8
NULL

$merged_dat
Null data.table (0 rows and 0 cols)

Warning message:
In SUSIE(dat = dat, dataset_type = dataset_type, LD_matrix = LD_matrix,  :
  Install Rfast to speed up susieR even further:
   install.packages('Rfast')

Session Info

``` > sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22 SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20 BSgenome_1.65.2 [4] rtracklayer_1.57.0 Biostrings_2.65.3 XVector_0.37.1 [7] GenomicRanges_1.49.1 GenomeInfoDb_1.33.5 IRanges_2.31.2 [10] S4Vectors_0.35.3 BiocGenerics_0.43.1 forcats_0.5.2 [13] stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4 [16] readr_2.1.2 tidyr_1.2.0 tibble_3.1.8 [19] ggplot2_3.3.6 tidyverse_1.3.2 data.table_1.14.2 [22] echolocatoR_2.0.1 loaded via a namespace (and not attached): [1] utf8_1.2.2 reticulate_1.26 R.utils_2.12.0 tidyselect_1.1.2 RSQLite_2.2.16 [6] AnnotationDbi_1.59.1 htmlwidgets_1.5.4 grid_4.2.0 BiocParallel_1.31.12 XGR_1.1.8 [11] munsell_0.5.0 codetools_0.2-18 interp_1.1-3 DT_0.24 withr_2.5.0 [16] colorspace_2.0-3 OrganismDbi_1.39.1 Biobase_2.57.1 filelock_1.0.2 knitr_1.40 [21] supraHex_1.35.0 rstudioapi_0.14 DescTools_0.99.46 MatrixGenerics_1.9.1 GenomeInfoDbData_1.2.8 [26] mixsqp_0.3-43 bit64_4.0.5 echoconda_0.99.7 basilisk_1.9.2 vctrs_0.4.1 [31] generics_0.1.3 xfun_0.32 biovizBase_1.45.0 BiocFileCache_2.5.0 R6_2.5.1 [36] AnnotationFilter_1.21.0 bitops_1.0-7 cachem_1.0.6 reshape_0.8.9 DelayedArray_0.23.1 [41] assertthat_0.2.1 BiocIO_1.7.1 scales_1.2.1 googlesheets4_1.0.1 nnet_7.3-17 [46] rootSolve_1.8.2.3 gtable_0.3.1 lmom_2.9 ggbio_1.45.0 ensembldb_2.21.4 [51] rlang_1.0.5 MungeSumstats_1.5.13 echodata_0.99.14 splines_4.2.0 lazyeval_0.2.2 [56] gargle_1.2.0 dichromat_2.0-0.1 hexbin_1.28.2 broom_1.0.1 checkmate_2.1.0 [61] modelr_0.1.9 BiocManager_1.30.18 yaml_2.3.5 reshape2_1.4.4 snpStats_1.47.1 [66] backports_1.4.1 GenomicFeatures_1.49.6 ggnetwork_0.5.10 Hmisc_4.7-1 RBGL_1.73.0 [71] tools_4.2.0 echoplot_0.99.5 ellipsis_0.3.2 catalogueR_1.0.0 RColorBrewer_1.1-3 [76] proxy_0.4-27 coloc_5.1.0 Rcpp_1.0.9 plyr_1.8.7 base64enc_0.1-3 [81] progress_1.2.2 zlibbioc_1.43.0 RCurl_1.98-1.8 basilisk.utils_1.9.2 prettyunits_1.1.1 [86] rpart_4.1.16 deldir_1.0-6 viridis_0.6.2 haven_2.5.1 cluster_2.1.3 [91] SummarizedExperiment_1.27.2 ggrepel_0.9.1 fs_1.5.2 crul_1.2.0 magrittr_2.0.3 [96] echotabix_0.99.8 dnet_1.1.7 openxlsx_4.2.5 reprex_2.0.2 googledrive_2.0.0 [101] mvtnorm_1.1-3 ProtGenerics_1.29.0 matrixStats_0.62.0 hms_1.1.2 patchwork_1.1.2 [106] XML_3.99-0.10 jpeg_0.1-9 readxl_1.4.1 gridExtra_2.3 compiler_4.2.0 [111] biomaRt_2.53.2 crayon_1.5.1 R.oo_1.25.0 htmltools_0.5.3 echoannot_0.99.7 [116] tzdb_0.3.0 Formula_1.2-4 expm_0.999-6 Exact_3.1 lubridate_1.8.0 [121] DBI_1.1.3 dbplyr_2.2.1 MASS_7.3-58.1 rappdirs_0.3.3 boot_1.3-28 [126] Matrix_1.4-1 piggyback_0.1.3 cli_3.3.0 R.methodsS3_1.8.2 echofinemap_0.99.3 [131] parallel_4.2.0 igraph_1.3.4 pkgconfig_2.0.3 GenomicAlignments_1.33.1 dir.expiry_1.5.0 [136] RCircos_1.2.2 foreign_0.8-82 osfr_0.2.8 xml2_1.3.3 rvest_1.0.3 [141] echoLD_0.99.7 VariantAnnotation_1.43.3 digest_0.6.29 graph_1.75.0 httpcode_0.3.0 [146] cellranger_1.1.0 htmlTable_2.4.1 gld_2.6.5 restfulr_0.0.15 curl_4.3.2 [151] Rsamtools_2.13.4 rjson_0.2.21 lifecycle_1.0.1 nlme_3.1-159 jsonlite_1.8.0 [156] viridisLite_0.4.1 fansi_1.0.3 downloadR_0.99.4 pillar_1.8.1 susieR_0.12.27 [161] lattice_0.20-45 GGally_2.1.2 googleAuthR_2.0.0 KEGGREST_1.37.3 fastmap_1.1.0 [166] httr_1.4.4 survival_3.3-1 glue_1.6.2 zip_2.2.0 png_0.1-7 [171] bit_4.0.4 Rgraphviz_2.41.1 class_7.3-20 stringi_1.7.8 blob_1.2.3 [176] latticeExtra_0.6-30 memoise_2.0.1 irlba_2.3.5 e1071_1.7-11 ape_5.6-2 ```
bschilder commented 1 year ago

Ok actually I see how this kind of suggests that you can supply an integer vector as a last resort. The issue is that message tries to print every single item in the vector. This isn't an issue with a small number of SNPs like in our MSS unit tests, but does become an issue when printing millions.

Posted this here and will get it fixed: https://github.com/neurogenomics/MungeSumstats/issues/125

Screenshot 2022-09-20 at 15 35 21

bschilder commented 1 year ago

I just got this running Note that: ABF does not get to if proportion_cases is missinng polyfun_susie is missing the environment.

Cool, that's progress.

Can you create a separate Issues for these to avoid mixing issues? (makes it easier to find solutions later)

AMCalejandro commented 1 year ago

Ok, did you merge with master?

[cid:0684037b-048e-4b15-ad56-a64551bb9956]


From: Brian M. Schilder @.> Sent: 20 September 2022 14:22 To: RajLabMSSM/echolocatoR @.> Cc: Martinez Carrasco, Alejandro @.>; Author @.> Subject: Re: [RajLabMSSM/echolocatoR] Problem with "N" using munged data (Issue #114)

⚠ Caution: External sender

  1. "N" 1.a. compute_n = "ldsc" (I get sample_size=NULL: must be valid integer)

This was a bug due to me missing some places where the sample_size was still being used. I just pushed a change to contruct_colmap so that is takes the arg N= instead of sample_size. N acts just like the other mapping columns in that it renames a column based on the value supplied to the argument (construct_colmap(N="N_cases") means that the "N_cases" column will get renamed to "N" and used in any subsequent steps that require total (or effective) sample size.

Note that whenever the "N" col is present in the post-standardized sumstats data, this will be used instead of whatever you supply to compute_n. I've updated the docs to better explain this.

[Screenshot 2022-09-20 at 13 03 12]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F34280215%2F191262664-8db22573-fd3f-4504-92ec-0d2a5ac2f6f7.png&data=05%7C01%7Calejandro.carrasco.20%40ucl.ac.uk%7Ce5cd22ffee1c41b3149008da9b0b3429%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637992769670337183%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FvyoWDTM%2Fp1rclfws7wx17ISZP9Wt7z9GVQBE%2F6BgBk%3D&reserved=0

1.b. compute_n = data_munged$NSTUDY (This does not work at all)

This is my bad; I told you yesterday that MungeSumstats::compute_nsize(compute_n=) can take a vector of sample size. But apparently I misremembered and it can only take a single number to be applied to all rows. Instead, I'll add a line to echodata::get_sample_size that handle these scenarios.

  1. Rfast

Rfast also wasn't installed. I've now ensured that it get automatically installed by making it an Import of echofinemap.

— Reply to this email directly, view it on GitHubhttps://github.com/RajLabMSSM/echolocatoR/issues/114#issuecomment-1252348223, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUDOQD5A33DGKBPQ6NWLR33V7G3CHANCNFSM6AAAAAAQQ5XKMA. You are receiving this because you authored the thread.Message ID: @.***>