Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Allow user-defined `sumstatsColHeaders` mapping as argument #45

Closed albert-ying closed 3 years ago

albert-ying commented 3 years ago

Hi, just a suggestion: I think pre-defined column-name mapping is convenient but not very flexible. For example, I have a summary stats table with a1 as effective allele and a0 as other allele. The format_sumstats will, however, treat a1 as reference allele and a0 as effective allele. I'll have to change the colnames of the file manually or update the sumstatsColHeaders just for this dataset to get it to work.

I think it would be better if the format_sumstats could take colnames mapping as the argument. For the example I showed, it would be much better if I can just do something like:

format_sumstats(path = raw_file, A1 = "a0", A2 = "a1", ...)

or

new_col_header = data.frame(Uncorrected = c("a0", "a1"), Corrected = c("A1", "A2"))
format_sumstats(path = raw_file, colheaders = new_col_header)

Thank you!

Al-Murphy commented 3 years ago

Hi @albert-ying,

Thank you for your suggestion! While we tried to make the mapping file all encompassing for summary statistics file headers and how people interpret them, I agree that it would be a good option to let the user input their own if required. To this end, I have included a mapping_file parameter where the user can input their own mapping file in the same format as data(sumstatsColHeaders). This will be available in 1.1.6+ which should go live today, we are just incorporating a few more changes to this release.

Thanks, Alan.

Al-Murphy commented 3 years ago

v1.1.7 has been pushed to Github with this functionality in place. The functionality will propagate to Bioconductor in the next release around September.