Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Option to tabix-index #7

Closed bschilder closed 3 years ago

bschilder commented 3 years ago

Tabix indexing makes querying your sum stats extremely efficient, with minimal changes to the format.

seqminer has a number of functions to do this within R. I've also written a bunch of command line wrappers to do these things in echolocatoR.

bschilder commented 3 years ago

This is now implemented when write_vcf=TRUE. To do: Implement for tabular format as well.

bschilder commented 3 years ago

Now implemented for tabular output format. The index_tabular function is also exported in case users want to index their sum stats later on:

https://github.com/neurogenomics/MungeSumstats/blob/bschilder_dev/R/index_tabular.R

New tests for this function are here: https://github.com/neurogenomics/MungeSumstats/blob/bschilder_dev/tests/testthat/test-index_tabular.R

I've also added the necessary checks in check_save_path to ensure the output filename ends with ".bgz": https://github.com/neurogenomics/MungeSumstats/blob/8ab84dc6e5195eb30dbd95f18ada0f03c1bbd362/R/check_save_path.R#L114