Column names in topSNPs do not agree with co

1. Bug description

The column names from topSNPs is not inheriting what we specify through echodata::construct_colmap

When I pass echodata::construct_colmap, I would expect the column names specify in the list are used to figure out column names in topSNPs.

NOTE When I change the BP name in topSNPs BP -> POS, then finemap_loci() runs. In this case the input GWAS has the bp col name set as BP, and the topSNPs BP col name is set as POS

2. Reproducible example


columnsnames = echodata::construct_colmap(munged= FALSE,
                                          CHR = "CHR", POS = "BP",
                                          SNP = "SNP", P = "P",
                                          Effect = "BETA", StdErr = "SE", 
                                          A1 = "A1", A2 = "A2",
                                          N = "N", MAF = "MAF")

finemap_loci(# GENERAL ARGUMENTS 
  topSNPs = topSNPs,
  results_dir = fullRS_path,
  loci = topSNPs$Locus,
  dataset_name = "LID_COX",
  dataset_type = "GWAS",  

  force_new_subset = TRUE,
  force_new_LD = TRUE,
  force_new_finemap = TRUE,
  remove_tmps = FALSE,

  finemap_methods = c("ABF","FINEMAP","SUSIE"),

  # Munge full sumstats first
  munged = FALSE,
  colmap = columnsnames,
  fullSS_path = newSS_name_colmap,
  fullSS_genome_build = "hg19",
  query_by ="tabix",

  bp_distance = 500000*2,
  min_MAF = 0.001, 
  trim_gene_limits = FALSE,
  case_control = FALSE,

  ## General
  n_causal = 5,
  credset_thresh = .95,
  consensus_thresh = 2,

  LD_reference = "1KGphase3",#"UKB",
  superpopulation = "EUR",
  download_method = "axel",
  LD_genome_build = "hg19",
  leadSNP_LD_block = FALSE,

  #### PLotting args ####
  plot_types = c("fancy"),
  show_plot = TRUE,
  zoom = c("1x", "10x", "20x"),
  tx_biotypes =  c("protein_coding"),
  nott_epigenome = FALSE,
  nott_show_placseq = FALSE,
  nott_binwidth = 200,
  nott_bigwig_dir = NULL,
  #xgr_libnames =c("ENCODE_TFBS_ClusteredV3_CellTypes", "TFBS_Conserved", "Uniform_TFBS"),

  #roadmap = TRUE,
  #roadmap_query = c("brain"),

  #### General args ####
  seed = 2022,
  nThread = 20,
  verbose = TRUE

Console output

│                                                 │
│   )))> 🦇 RP11-240A16.1 [locus 1 / 3] 🦇 <(((   │
│                                                 │


── Step 1 ▶▶▶ Query 🔎 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

+ Query Method: tabix
Constructing GRanges query using min/max ranges within a single chromosome.
'start' or 'end' cannot contain NAsLocus RP11-240A16.1 complete in: 0 min


> topSNPs
# A tibble: 3 × 7
  Locus         Gene            CHR       BP SNP                     P  BETA
  <chr>         <chr>         <dbl>    <dbl> <chr>               <dbl> <dbl>
1 RP11-240A16.1 RP11-240A16.1     4 32435284 rs189093213 0.00000000167  1.12

> str(data_colmaps)  # This is the input GWAS
'data.frame':   6530650 obs. of  11 variables:
 $ SNP : chr  "rs58276399" "rs142557973" "rs141242758" "rs2073813" ...
 $ CHR : num  1 1 1 1 1 1 1 1 1 1 ...
 $ BP  : num  731718 731718 734349 753541 766007 ...
 $ A1  : chr  "t" "t" "t" "a" ...
 $ A2  : chr  "c" "c" "c" "g" ...
 $ FREQ: num  0.884 0.884 0.884 0.126 0.9 ...
 $ BETA: num  0.1775 0.1775 0.1577 0.0721 0.2559 ...
 $ SE  : num  0.158 0.158 0.159 0.118 0.164 ...
 $ P   : num  0.262 0.262 0.322 0.54 0.119 ...
 $ N   : int  1297 1297 1297 2687 1297 2687 2687 2687 2687 1297 ...
 $ MAF : num  0.1163 0.1163 0.1157 0.1257 0.0995 ...

> columnsnames

[1] "CHR"

[1] "BP"

[1] "SNP"

[1] "P"

[1] "BETA"

[1] "SE"

[1] "tstat"

[1] "Locus"

[1] "Freq"

[1] "MAF"

[1] "A1"

[1] "A2"

[1] "Gene"

[1] "N_cases"

[1] "N_controls"

[1] "calculate"

[1] "N"

[1] TRUE

3. Session info

(Add output of the R function utils::sessionInfo() below. This helps us assess version/OS conflicts which could be causing bugs.)

@AMCalejandro can you share your topSNPs object? I'm wondering if some columns there might be causing this.

Actually, looking at your post again I think i understand it better.

The column names from topSNPs is not inheriting what we specify through echodata::construct_colmap

This is actually not the intended behaviour of colmap, for the reason that topSNPs (often a supplementary table somewhere in the publication) and fullSS_path (the full dataset) very frequently have different column naming schemes. So I think it's best to keep munging these as separate steps.

Thus, I recommend using the dedicated function to munge your topSNPs before passing it into finemap_loci:

topSNPs <- echodata::import_topSNPs(...)