Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Output results as *.gz for LDSC ready format #3

Closed Al-Murphy closed 3 years ago

Al-Murphy commented 3 years ago

@bschilder might be worth us looking into adding this functionality to your branch, it was asked for from a user. Do you know what exactly is required for this format?

bschilder commented 3 years ago

Been a while since i used LDSC but im sure i could figure it out. Will add to my branch (bschilder_dev) once I do.

Al-Murphy commented 3 years ago

Great, let me know if I can help with this? I have yet to actually run LDSC myself though!

bschilder commented 3 years ago

Info here: https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format

I think we have most things by default except:

That said, Kitty was able to run Open GWAS sum stats i just processed with MungeSumstats, but only after running LDSC's munge_sumstats.py https://github.com/bulik/ldsc/blob/master/munge_sumstats.py

Al-Murphy commented 3 years ago

Yep so Roxy said a similar thing that after running munge_sumstats.py it was fine. We should probably look at that function? My interpretation was that it just removed unnecessary columns?

bschilder commented 3 years ago

Exporting to LDSC format is now supported by setting format_sumstats(ldsc_format=TRUE, ...)

New functions to support this are:

check_ldsc_format

https://github.com/neurogenomics/MungeSumstats/blob/bschilder_dev/R/check_ldsc_format.R

check_zscore

https://github.com/neurogenomics/MungeSumstats/blob/bschilder_dev/R/check_zscore.R

bschilder commented 3 years ago

Passed force_new_z up to format_sumstats and set default to FALSE.