Closed rmgpanw closed 7 months ago
Hey! This is a simple fix, the issue is that OA is not present in the column header mapping file:
data("sumstatsColHeaders")
#look in this dataframe
sumstatsColHeaders
MSS then tried to input A1 from the reference genome and that's what you are seeing in the A1 column not just duplicates.
I'm surprised I hadn't caught this matching for A1 before but just shows the real lack of standardisation in column naming. Here is the output with this added:
SNP CHR BP A1 A2 BETA SE CHISQ P LOG10P
<char> <int> <int> <char> <char> <num> <num> <num> <num> <num>
1: rs6 7 91747131 A T -0.011340 0.699120 0.128844 0.341856 0.466157
2: rs7 7 91779557 T A -0.050145 0.196447 0.249215 0.841040 0.075183
3: rs5 7 91839110 C G 0.009978 0.111421 0.148457 0.852646 0.069231
4: rs8 7 92408329 C G -0.041927 0.160814 0.386157 0.640864 0.193234
I've added this in v 1.13.1, this is the new devel branch (Bioc 3.20) that was just released over the past few days. If yuou don't want to swap to this version just add OA -> A1 to sumstatsColHeaders on your own version, details here: https://github.com/neurogenomics/MungeSumstats/blob/master/R/data.R
1. Bug description
format_sumstats()
duplicates base letter forA1
andA2
with certain input column headersConsole output
Please see reprex
2. Reproducible example
Code
Created on 2024-05-01 with reprex v2.0.2
Expected behaviour
Expect output from above to:
OA
columnA2
column not to duplicateA1
3. Session info