RajLabMSSM / echoannot

echoverse module: Annotate fine-mapping results
2 stars 0 forks source link

Bug annotating finemap results with roadmap #14

Closed AMCalejandro closed 1 year ago

AMCalejandro commented 2 years ago

1. Bug description

I am trying to annotate finemapping results against brain tissue marks from ROADMAP, but there is this bug at the end of the query

Expected behaviour

2. Reproducible example

Code

(Please add the steps to reproduce the bug here. See here for an intro to making a reproducible example (i.e. reprex) and why they're important! This will help us to help you much faster.)

columnsnames = echodata::construct_colmap(munged= FALSE,
                                          CHR = "CHR", POS = "BP",
                                          SNP = "SNP", P = "P",
                                          Effect = "BETA", StdErr = "SE", 
                                          A1 = "A1", A2 = "A2",
                                          N = "N", N_cases = "N_CAS",
                                          N_controls = "N_CON", MAF = "MAF")
                                          #Freq = "FREQ", N = "N",
                                          #N_cases = NULL,
                                          #N_controls = NULL,
                                          #proportion_cases = NULL,
                                          #MAF = "calculate")
#tstat = NULL)

finemap_loci(# GENERAL ARGUMENTS 
  topSNPs = topSNPs,
  results_dir = fullRS_path,
  loci = topSNPs$Locus,
  dataset_name = "LID_COX",
  dataset_type = "GWAS",  

  force_new_subset = TRUE,
  force_new_LD = FALSE,
  force_new_finemap = FALSE,
  remove_tmps = FALSE,

  finemap_methods = c("FINEMAP","SUSIE"),

  # Munge full sumstats first
  munged = FALSE,
  colmap = columnsnames,
  # SUMMARY STATS ARGUMENTS
  fullSS_path = newSS_name_colmap,
  fullSS_genome_build = "hg19",
  query_by ="tabix",

  bp_distance = 500000*2,
  min_MAF = 0.001, 
  trim_gene_limits = FALSE,
  case_control = TRUE,

  # FINE-MAPPING ARGUMENTS
  ## General
  n_causal = 5,
  credset_thresh = .95,
  consensus_thresh = 2,

  # LD ARGUMENTS 
  LD_reference = "1KGphase3",#"UKB",
  superpopulation = "EUR",
  download_method = "axel",
  LD_genome_build = "hg19",
  leadSNP_LD_block = FALSE,

  #### PLotting args ####
  plot_types = c("fancy"),
  show_plot = TRUE,
  zoom = c("1x", "10x", "20x"),
  #zoom = "1x",
  tx_biotypes = NULL,
  nott_epigenome = FALSE,
  nott_show_placseq = FALSE,
  nott_binwidth = 200,
  nott_bigwig_dir = NULL,
  #xgr_libnames =c("ENCODE_TFBS_ClusteredV3_CellTypes", "TFBS_Conserved", "Uniform_TFBS"),

  roadmap = TRUE,
  roadmap_query = c("brain"),

  #### General args ####
  seed = 2022,
  nThread = 20,
  verbose = TRUE
)

Console output

┌──────────────────────────────────────────┐
│                                          │
│   )))> 🦇 TRIM22 [locus 1 / 1] 🦇 <(((   │
│                                          │
└──────────────────────────────────────────┘

────────────────────────────────────────────────────────────────────────────────

── Step 1 ▶▶▶ Query 🔎 ─────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
query_dat is already a GRanges object. Returning directly.
========= echotabix::convert =========
Converting full summary stats file to tabix format for fast querying.
Inferred format: 'table'
Explicit format: 'table'
Inferring comment_char from tabular header: 'SNP'
Determining chrom type from file header.
Chromosome format: 1
Detecting column delimiter.
Identified column separator: \t
Sorting rows by coordinates via bash.
Searching for header row with grep.
( grep ^'SNP' .../QC_SNPs_COLMAP.txt; grep
    -v ^'SNP' .../QC_SNPs_COLMAP.txt | sort
    -k2,2n
    -k3,3n ) > .../filedd61f50654_sorted.tsv
Constructing outputs
Using existing bgzipped file: /home/rstudio/echolocatoR/echolocatoR_will/QC_SNPs_COLMAP.txt.bgz 
Set force_new=TRUE to override this.
Tabix-indexing file using: Rsamtools
Data successfully converted to bgzip-compressed, tabix-indexed format.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Inferred format: 'table'
Querying tabular tabix file using: Rsamtools.
Checking query chromosome style is correct.
Chromosome format: 1
Retrieving data.
Converting query results to data.table.
Processing query: 11:4724803-6724803
Adding 'query' column to results.
Retrieved data with 7,013 rows
Saving query ==> /home/rstudio/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22/TRIM22_LID_COX_subset.tsv.gz
+ Query: 7,013 SNPs x 12 columns.
Standardizing summary statistics subset.
Standardizing main column names.
++ Preparing A1,A1 cols
++ Preparing MAF,Freq cols.
++ Removing SNPs with MAF== 0 | NULL | NA or >1
++ Preparing N_cases,N_controls cols.
++ Preparing proportion_cases col.
++ Calculating proportion_cases from N_cases and N_controls.
Loading required namespace: MungeSumstats
Preparing sample size column (N).
Using existing 'N' column.
+ Imputing t-statistic from Effect and StdErr.
+ leadSNP missing. Assigning new one by min p-value.
++ Ensuring Effect,StdErr,P are numeric.
++ Ensuring 1 SNP per row and per genomic coordinate.
++ Removing extra whitespace
+ Standardized query: 7,013 SNPs x 15 columns.
++ Saving standardized query ==> /home/rstudio/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22/TRIM22_LID_COX_subset.tsv.gz

────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Previously computed LD_matrix detected. Importing: /home/rstudio/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22/LD/TRIM22.1KGphase3_LD.RDS
LD_reference identified as: r.
Converting obj to sparseMatrix.
+ FILTER:: Filtering by LD features.

────────────────────────────────────────────────────────────────────────────────

── Step 3 ▶▶▶ Filter SNPs 🚰 ───────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
FILTER:: Filtering by SNP features.
+ FILTER:: Removing SNPs with MAF < 0.001
+ FILTER:: Post-filtered data: 6997 x 15
+ Subsetting LD matrix and dat to common SNPs...
Removing unnamed rows/cols
Replacing NAs with 0
+ LD_matrix = 6997 SNPs.
+ dat = 6997 SNPs.
+ 6997 SNPs in common.
Converting obj to sparseMatrix.

────────────────────────────────────────────────────────────────────────────────

── Step 4 ▶▶▶ Fine-map 🔊 ──────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
Gathering method sources.
Gathering method citations.
++ Previously multi-finemapped results identified. Importing: /home/rstudio/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22/Multi-finemap/1KGphase3_LD.Multi-finemap.tsv.gz
+ Fine-mapping with 'FINEMAP, SUSIE' completed:

────────────────────────────────────────────────────────────────────────────────

── Step 5 ▶▶▶ Plot 📈 ──────────────────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
+-------- Locus Plot:  TRIM22 --------+
+ support_thresh = 2
+ Calculating mean Posterior Probability (mean.PP)...
+ 2 fine-mapping methods used.
+ 3 Credible Set SNPs identified.
+ 0 Consensus SNPs identified.
+ Filling NAs in CS cols with 0.
+ Filling NAs in PP cols with 0.
LD_matrix detected. Coloring SNPs by LD with lead SNP.
++ echoplot:: GWAS full window track
++ echoplot:: GWAS track
++ echoplot:: Merged fine-mapping track
Melting PP and CS from 3 fine-mapping methods.
++ echoplot:: Adding Gene model track.
Converting dat to GRanges object.
Loading required namespace: EnsDb.Hsapiens.v75
max_transcripts= 1 . 
82  transcripts from  82  genes returned.
Loading required namespace: pals
Fetching data...OK
Parsing exons...OK
Defining introns...OK
Defining UTRs...OK
Defining CDS...OK
aggregating...
Done
Constructing graphics...
echoannot:: Plotting ROADMAP annotations.
Converting dat to GRanges object.
+ ROADMAP:: 13 annotation(s) identified that match: brain
Querying subset from Roadmap API: E053 - 1/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Querying subset from Roadmap API: E054 - 2/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E053
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E067 - 3/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E054
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E068 - 4/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E067
Preexisting file detected. Set force_overwrite=TRUE to override this.
Downloading Roadmap Chromatin Marks: E068
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E069 - 5/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Querying subset from Roadmap API: E070 - 6/13
Downloading Roadmap Chromatin Marks: E069
Constructing GRanges query using min/max ranges across one or more chromosomes.
Preexisting file detected. Set force_overwrite=TRUE to override this.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E070
Querying subset from Roadmap API: E071 - 7/13
Preexisting file detected. Set force_overwrite=TRUE to override this.
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Querying subset from Roadmap API: E072 - 8/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E071
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E073 - 9/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E072
Preexisting file detected. Set force_overwrite=TRUE to override this.
Downloading Roadmap Chromatin Marks: E073
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E074 - 10/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E074
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E081 - 11/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Querying subset from Roadmap API: E082 - 12/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E081
Preexisting file detected. Set force_overwrite=TRUE to override this.
Querying subset from Roadmap API: E125 - 13/13
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
Downloading Roadmap Chromatin Marks: E082
Preexisting file detected. Set force_overwrite=TRUE to override this.
Downloading Roadmap Chromatin Marks: E125
Preexisting file detected. Set force_overwrite=TRUE to override this.
Annotating chromatin states.
unable to find an inherited method for function 'mcols' for signature '"try-error"'Locus TRIM22 complete in: 1.66 min

────────────────────────────────────────────────────────────────────────────────

── Step 6 ▶▶▶ Postprocess data 🎁 ──────────────────────────────────────────────

────────────────────────────────────────────────────────────────────────────────
Returning results as nested list.
All loci done in: 1.66 min
$TRIM22
NULL

$merged_dat
Null data.table (0 rows and 0 cols)

Warning message:
In parallel::mclapply(seq_len(length(eid_list)), function(i) { :
  all scheduled cores encountered errors in user code

Data

rstudio@3e36fec3eec9:~/echolocatoR/echolocatoR_will/RESULTS/GWAS/LID_COX/TRIM22$ head ../../../../QC_SNPs_COLMAP.txt
SNP     CHR     BP      A1      A2      MAF     BETA    SE      P       N       N_CAS   N_CON
rs3131972       1       752721  A       G       0.1806  0.07177 0.1482  0.6281  2696    588     2108
rs11240777      1       798959  A       G       0.2068  0.02904 0.1454  0.8417  2572    510     2062
rs28482280      1       834056  C       A       0.01188 -1.013  0.6109  0.09743 2610    568     2042
rs7518581       1       834956  A       G       0.01187 -1.02   0.6109  0.09504 2612    570     2042
rs149737509     1       837657  C       G       0.01343 -0.609  0.578   0.292   2606    560     2046
rs28678693      1       838665  C       T       0.01331 -0.7232 0.5261  0.1693  2630    580     2050
rs28477624      1       838732  A       G       0.01257 -0.6727 0.5515  0.2225  2626    578     2048
rs28437697      1       838890  G       A       0.01257 -0.6727 0.5515  0.2225  2626    578     2048
rs28539852      1       838916  T       A       0.0126  -0.6755 0.551   0.2202  2620    576     2044

3. Session info

``` > sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.4 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.22 SNPlocs.Hsapiens.dbSNP144.GRCh37_0.99.20 BSgenome_1.65.2 [4] rtracklayer_1.57.0 Biostrings_2.65.3 XVector_0.37.1 [7] GenomicRanges_1.49.1 GenomeInfoDb_1.33.5 IRanges_2.31.2 [10] S4Vectors_0.35.3 BiocGenerics_0.43.1 forcats_0.5.2 [13] stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4 [16] readr_2.1.2 tidyr_1.2.0 tibble_3.1.8 [19] ggplot2_3.3.6 tidyverse_1.3.2 data.table_1.14.2 [22] echolocatoR_2.0.1 loaded via a namespace (and not attached): [1] rappdirs_0.3.3 GGally_2.1.2 R.methodsS3_1.8.2 ragg_1.2.2 [5] echoLD_0.99.7 bit64_4.0.5 knitr_1.40 irlba_2.3.5 [9] DelayedArray_0.23.1 R.utils_2.12.0 rpart_4.1.16 KEGGREST_1.37.3 [13] RCurl_1.98-1.8 AnnotationFilter_1.21.0 generics_0.1.3 GenomicFeatures_1.49.6 [17] RSQLite_2.2.16 proxy_0.4-27 bit_4.0.4 tzdb_0.3.0 [21] xml2_1.3.3 lubridate_1.8.0 SummarizedExperiment_1.27.2 assertthat_0.2.1 [25] viridis_0.6.2 gargle_1.2.0 xfun_0.32 hms_1.1.2 [29] fansi_1.0.3 restfulr_0.0.15 progress_1.2.2 dbplyr_2.2.1 [33] readxl_1.4.1 Rgraphviz_2.41.1 igraph_1.3.4 DBI_1.1.3 [37] htmlwidgets_1.5.4 reshape_0.8.9 downloadR_0.99.4 googledrive_2.0.0 [41] ellipsis_0.3.2 backports_1.4.1 biomaRt_2.53.2 deldir_1.0-6 [45] MatrixGenerics_1.9.1 MungeSumstats_1.5.13 vctrs_0.4.1 Biobase_2.57.1 [49] ensembldb_2.21.4 cachem_1.0.6 withr_2.5.0 checkmate_2.1.0 [53] GenomicAlignments_1.33.1 prettyunits_1.1.1 cluster_2.1.3 ape_5.6-2 [57] dir.expiry_1.5.0 lazyeval_0.2.2 crayon_1.5.1 basilisk.utils_1.9.2 [61] crul_1.2.0 labeling_0.4.2 pkgconfig_2.0.3 nlme_3.1-159 [65] ProtGenerics_1.29.0 XGR_1.1.8 gitcreds_0.1.1 pals_1.7 [69] nnet_7.3-17 rlang_1.0.5 lifecycle_1.0.1 filelock_1.0.2 [73] httpcode_0.3.0 BiocFileCache_2.5.0 modelr_0.1.9 echotabix_0.99.8 [77] dichromat_2.0-0.1 cellranger_1.1.0 coloc_5.1.0 matrixStats_0.62.0 [81] graph_1.75.0 Matrix_1.4-1 osfr_0.2.8 boot_1.3-28 [85] reprex_2.0.2 base64enc_0.1-3 googlesheets4_1.0.1 png_0.1-7 [89] viridisLite_0.4.1 rjson_0.2.21 rootSolve_1.8.2.3 bitops_1.0-7 [93] R.oo_1.25.0 ggnetwork_0.5.10 blob_1.2.3 mixsqp_0.3-43 [97] echoplot_0.99.5 dnet_1.1.7 jpeg_0.1-9 echodata_0.99.14 [101] scales_1.2.1 memoise_2.0.1 magrittr_2.0.3 plyr_1.8.7 [105] hexbin_1.28.2 zlibbioc_1.43.0 compiler_4.2.0 echoconda_0.99.7 [109] BiocIO_1.7.1 RColorBrewer_1.1-3 catalogueR_1.0.0 EnsDb.Hsapiens.v75_2.99.0 [113] Rsamtools_2.13.4 cli_3.3.0 echoannot_0.99.7 patchwork_1.1.2 [117] htmlTable_2.4.1 Formula_1.2-4 MASS_7.3-58.1 tidyselect_1.1.2 [121] stringi_1.7.8 textshaping_0.3.6 yaml_2.3.5 supraHex_1.35.0 [125] latticeExtra_0.6-30 ggrepel_0.9.1 grid_4.2.0 VariantAnnotation_1.43.3 [129] tools_4.2.0 lmom_2.9 parallel_4.2.0 rstudioapi_0.14 [133] foreign_0.8-82 piggyback_0.1.3 gridExtra_2.3 gld_2.6.5 [137] farver_2.1.1 digest_0.6.29 snpStats_1.47.1 BiocManager_1.30.18 [141] Rcpp_1.0.9 broom_1.0.1 OrganismDbi_1.39.1 httr_1.4.4 [145] AnnotationDbi_1.59.1 RCircos_1.2.2 ggbio_1.45.0 biovizBase_1.45.0 [149] colorspace_2.0-3 rvest_1.0.3 XML_3.99-0.10 fs_1.5.2 [153] reticulate_1.26 splines_4.2.0 RBGL_1.73.0 expm_0.999-6 [157] gh_1.3.0 echofinemap_0.99.3 basilisk_1.9.2 Exact_3.1 [161] mapproj_1.2.8 systemfonts_1.0.4 jsonlite_1.8.0 susieR_0.12.27 [165] R6_2.5.1 Hmisc_4.7-1 pillar_1.8.1 htmltools_0.5.3 [169] glue_1.6.2 fastmap_1.1.0 DT_0.24 BiocParallel_1.31.12 [173] class_7.3-20 codetools_0.2-18 maps_3.4.0 mvtnorm_1.1-3 [177] utf8_1.2.2 lattice_0.20-45 curl_4.3.2 DescTools_0.99.46 [181] zip_2.2.0 openxlsx_4.2.5 interp_1.1-3 survival_3.3-1 [185] googleAuthR_2.0.0 munsell_0.5.0 e1071_1.7-11 GenomeInfoDbData_1.2.8 [189] haven_2.5.1 reshape2_1.4.4 gtable_0.3.1 ```
bschilder commented 1 year ago

See #11

bschilder commented 1 year ago

Just implemented a workaround that doesn't use rtracklayer. If you reinstall echoannot, does this work?