RajLabMSSM / echolocatoR

Automated statistical and functional fine-mapping pipeline with extensive API access to datasets.
https://rajlabmssm.github.io/echolocatoR
MIT License
30 stars 11 forks source link

finemap_loci fails using echodata::construct_colmaps if some parameters are set to NULL #120

Closed AMCalejandro closed 1 year ago

AMCalejandro commented 1 year ago

1. Bug description

When doing finemapping for QT, it does not really make sense to pass N_CAS, N_CON, or proportion_cases. Similarly, if I do not have t_stat, I would set this to NULL. Otherwise, what I understand from this function is that finemap_loci() will search for a column in my data called t_stat

However, if you do so, finemap_loci fails

2. Reproducible example

Code

columnsnames = echodata::construct_colmap(munged= FALSE,
                                          CHR = "CHR", POS = "POS",
                                          SNP = "SNP", P = "P",
                                          Effect = "BETA", StdErr = "SE", 
                                          A1 = "A1", A2 = "A2",
                                          Freq = "FREQ", N = "N",
                                          N_cases = NULL,
                                          N_controls = NULL,
                                          proportion_cases = NULL,
                                          MAF = "calculate",
                                          tstat = NULL)
                                          )

# Pass the sample size as "N" column
# compute_n will do all what is in the docu f N does not exist

finemap_loci(# GENERAL ARGUMENTS 
                                          topSNPs = topSNPs,
                                          results_dir = fullRS_path,
                                          loci = topSNPs$Locus,
                                          dataset_name = "LID_COX",
                                          dataset_type = "GWAS",  
                                          force_new_subset = TRUE,
                                          force_new_LD = FALSE,
                                          force_new_finemap = TRUE,
                                          remove_tmps = FALSE,

                                          finemap_methods = c("ABF","FINEMAP","SUSIE", "POLYFUN_SUSIE"),

                                          # Munge full sumstats first
                                          munged = FALSE,
                                          colmap = columnsnames,
                                          # SUMMARY STATS ARGUMENTS
                                          fullSS_path = newSS_name_colmap,
                                          fullSS_genome_build = "hg19",
                                          query_by ="tabix",

                                          compute_n = 3500,

                                          bp_distance = 10000,#500000*2,
                                          min_MAF = 0.001, 
                                          trim_gene_limits = FALSE,

                                          case_control = FALSE,

                                          # FINE-MAPPING ARGUMENTS
                                          ## General
                                          n_causal = 5,
                                          credset_thresh = .95,
                                          consensus_thresh = 2,

                                          # LD ARGUMENTS 
                                          LD_reference = "1KGphase3",#"UKB",
                                          superpopulation = "EUR",
                                          download_method = "axel",
                                          LD_genome_build = "hg19",
                                          leadSNP_LD_block = FALSE,

                                          #### PLotting args ####
                                          plot_types = c("simple"),
                                          show_plot = TRUE,
                                          zoom = "1x",
                                          tx_biotypes = NULL,
                                          nott_epigenome = FALSE,
                                          nott_show_placseq = FALSE,
                                          nott_binwidth = 200,
                                          nott_bigwig_dir = NULL,
                                          xgr_libnames = NULL,
                                          roadmap = FALSE,
                                          roadmap_query = NULL,

                                          #### General args ####
                                          seed = 2022,
                                          nThread = 20,
                                          verbose = TRUE
                                          )

Console output

PolyFun submodule already installed.
┌────────────────────────────────────────┐
│                                        │
│   )))> 🦇 LRP8 [locus 1 / 1] 🦇 <(((   │
│                                        │
└────────────────────────────────────────┘

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 1 ▶▶▶ Query 🔎 ───────────────────────────────────────────────────────────────────────────────────────────────────

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'SNP' .../QC_SNPs_COLMAP.txt; grep
    -v ^'SNP' .../QC_SNPs_COLMAP.txt | sort
    -k2,2n
    -k3,3n ) > .../file30b4b2c4529_sorted.tsv
Constructing outputs
Using existing bgzipped file: /home/rstudio/echolocatoR/echolocatoR_LID/QC_SNPs_COLMAP.txt.bgz
Set force_new=TRUE to override this.
Tabix-indexing file using: Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 1:53768300-53788300
Adding 'query' column to results.
Retrieved data with 52 rows
Saving query ==> /home/rstudio/echolocatoR/echolocatoR_LID/RESULTS/GWAS/LID_COX/LRP8/LRP8_LID_COX_subset.tsv.gz
+ Query: 52 SNPs x 10 columns.
Standardizing summary statistics subset.
Standardizing main column names.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols.
++ Could not infer MAF.
++ Preparing N_cases,N_controls cols.
++ Preparing proportion_cases col.
argument is of length zeroLocus LRP8 complete in: 0.3 min

Data

> head(data_colmaps)
           SNP CHR    POS A1 A2   FREQ    BETA     SE      P    N
1:  rs58276399   1 731718  t  c 0.8837 -0.1775 0.1583 0.2621 1297
2: rs141242758   1 734349  t  c 0.8843 -0.1577 0.1593 0.3223 1297
3:   rs2073813   1 753541  a  g 0.1257  0.0721 0.1177 0.5399 2687
4:  rs61768174   1 766007  a  c 0.9005 -0.2559 0.1642 0.1190 1297
5:  rs60320384   1 769223  c  g 0.8749 -0.0772 0.1178 0.5124 2687
6:  rs59066358   1 771967  a  g 0.1255  0.0776 0.1177 0.5095 2687

3. Session info

``` > sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods [8] base other attached packages: [1] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22 [2] SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20 [3] BSgenome_1.65.2 [4] rtracklayer_1.57.0 [5] Biostrings_2.65.3 [6] XVector_0.37.1 [7] GenomicRanges_1.49.1 [8] GenomeInfoDb_1.33.5 [9] IRanges_2.31.2 [10] S4Vectors_0.35.3 [11] BiocGenerics_0.43.1 [12] forcats_0.5.2 [13] stringr_1.4.1 [14] dplyr_1.0.10 [15] purrr_0.3.4 [16] readr_2.1.2 [17] tidyr_1.2.0 [18] tibble_3.1.8 [19] ggplot2_3.3.6 [20] tidyverse_1.3.2 [21] data.table_1.14.2 [22] echolocatoR_2.0.1 loaded via a namespace (and not attached): [1] rappdirs_0.3.3 GGally_2.1.2 [3] R.methodsS3_1.8.2 echoLD_0.99.7 [5] bit64_4.0.5 knitr_1.40 [7] irlba_2.3.5 DelayedArray_0.23.1 [9] R.utils_2.12.0 rpart_4.1.16 [11] KEGGREST_1.37.3 RCurl_1.98-1.8 [13] AnnotationFilter_1.21.0 generics_0.1.3 [15] GenomicFeatures_1.49.6 RSQLite_2.2.16 [17] proxy_0.4-27 bit_4.0.4 [19] tzdb_0.3.0 xml2_1.3.3 [21] lubridate_1.8.0 SummarizedExperiment_1.27.2 [23] assertthat_0.2.1 viridis_0.6.2 [25] gargle_1.2.0 xfun_0.32 [27] hms_1.1.2 fansi_1.0.3 [29] restfulr_0.0.15 progress_1.2.2 [31] dbplyr_2.2.1 readxl_1.4.1 [33] Rgraphviz_2.41.1 igraph_1.3.4 [35] DBI_1.1.3 htmlwidgets_1.5.4 [37] reshape_0.8.9 downloadR_0.99.4 [39] googledrive_2.0.0 ellipsis_0.3.2 [41] backports_1.4.1 biomaRt_2.53.2 [43] deldir_1.0-6 MatrixGenerics_1.9.1 [45] MungeSumstats_1.5.13 vctrs_0.4.1 [47] Biobase_2.57.1 ensembldb_2.21.4 [49] cachem_1.0.6 withr_2.5.0 [51] checkmate_2.1.0 GenomicAlignments_1.33.1 [53] prettyunits_1.1.1 cluster_2.1.3 [55] ape_5.6-2 dir.expiry_1.5.0 [57] lazyeval_0.2.2 crayon_1.5.1 [59] basilisk.utils_1.9.2 crul_1.2.0 [61] pkgconfig_2.0.3 nlme_3.1-159 [63] ProtGenerics_1.29.0 XGR_1.1.8 [65] nnet_7.3-17 gitcreds_0.1.1 [67] rlang_1.0.5 lifecycle_1.0.1 [69] filelock_1.0.2 httpcode_0.3.0 [71] BiocFileCache_2.5.0 modelr_0.1.9 [73] echotabix_0.99.8 dichromat_2.0-0.1 [75] cellranger_1.1.0 coloc_5.1.0 [77] matrixStats_0.62.0 graph_1.75.0 [79] Matrix_1.4-1 osfr_0.2.8 [81] boot_1.3-28 reprex_2.0.2 [83] base64enc_0.1-3 googlesheets4_1.0.1 [85] png_0.1-7 viridisLite_0.4.1 [87] rjson_0.2.21 rootSolve_1.8.2.3 [89] bitops_1.0-7 R.oo_1.25.0 [91] ggnetwork_0.5.10 blob_1.2.3 [93] mixsqp_0.3-43 echoplot_0.99.5 [95] dnet_1.1.7 jpeg_0.1-9 [97] echodata_0.99.14 scales_1.2.1 [99] memoise_2.0.1 magrittr_2.0.3 [101] plyr_1.8.7 hexbin_1.28.2 [103] zlibbioc_1.43.0 compiler_4.2.0 [105] echoconda_0.99.7 BiocIO_1.7.1 [107] RColorBrewer_1.1-3 catalogueR_1.0.0 [109] Rsamtools_2.13.4 cli_3.3.0 [111] echoannot_0.99.7 patchwork_1.1.2 [113] htmlTable_2.4.1 Formula_1.2-4 [115] MASS_7.3-58.1 tidyselect_1.1.2 [117] stringi_1.7.8 yaml_2.3.5 [119] supraHex_1.35.0 latticeExtra_0.6-30 [121] ggrepel_0.9.1 grid_4.2.0 [123] VariantAnnotation_1.43.3 tools_4.2.0 [125] lmom_2.9 parallel_4.2.0 [127] rstudioapi_0.14 foreign_0.8-82 [129] piggyback_0.1.3 gridExtra_2.3 [131] gld_2.6.5 digest_0.6.29 [133] snpStats_1.47.1 BiocManager_1.30.18 [135] Rcpp_1.0.9 broom_1.0.1 [137] OrganismDbi_1.39.1 httr_1.4.4 [139] AnnotationDbi_1.59.1 RCircos_1.2.2 [141] ggbio_1.45.0 biovizBase_1.45.0 [143] colorspace_2.0-3 rvest_1.0.3 [145] XML_3.99-0.10 fs_1.5.2 [147] reticulate_1.26 splines_4.2.0 [149] RBGL_1.73.0 expm_0.999-6 [151] gh_1.3.0 echofinemap_0.99.3 [153] basilisk_1.9.2 Exact_3.1 [155] jsonlite_1.8.0 susieR_0.12.27 [157] R6_2.5.1 Hmisc_4.7-1 [159] pillar_1.8.1 htmltools_0.5.3 [161] glue_1.6.2 fastmap_1.1.0 [163] DT_0.24 BiocParallel_1.31.12 [165] class_7.3-20 codetools_0.2-18 [167] mvtnorm_1.1-3 utf8_1.2.2 [169] lattice_0.20-45 curl_4.3.2 [171] DescTools_0.99.46 zip_2.2.0 [173] openxlsx_4.2.5 interp_1.1-3 [175] survival_3.3-1 googleAuthR_2.0.0 [177] munsell_0.5.0 e1071_1.7-11 [179] GenomeInfoDbData_1.2.8 haven_2.5.1 [181] reshape2_1.4.4 gtable_0.3.1 ```
bschilder commented 1 year ago

Thanks for pointing this out @AMCalejandro . I've just added a line to echodata:::standardize_proportion_cases so that it skips this step when you set colmap$proportion_cases to NULL. https://github.com/RajLabMSSM/echodata/blob/main/R/standardize_proportion_cases.R

Let me know if this works for you.