Closed bschilder closed 3 years ago
Added several new subfunctions for handling this. These actually do a pretty good job of parsing the VCF, so we might want to consider eventually transitioning to this dedicated VCF parsing tool (VariantAnnotation
). But I made this method a backup for now bc I wasn't sure if there would be downstream consequences I hadn't thought of.
https://github.com/neurogenomics/MungeSumstats/blob/bschilder_dev/R/read_vcf_data.R https://github.com/neurogenomics/MungeSumstats/blob/bschilder_dev/R/vcf2df.R
I also improved get_vcf_sample_ids
so that it searches for sample names regardless of whether the path is a MRC IEU URL.
https://github.com/neurogenomics/MungeSumstats/blob/bschilder_dev/R/get_vcf_sample_ids.R
Also, tried to cover a lot of potential scenarios with some new tests for write_sumstats
:
https://github.com/neurogenomics/MungeSumstats/blob/bschilder_dev/tests/testthat/test-write_sumstats.R
When
MungeSumstats
writes VCF, it doesn't write #CHR, which we currently rely on inread_sumstats
/read_vcf
. I'm modifyingread_vcf
so it can still read in VCFs in these situations.https://github.com/neurogenomics/MungeSumstats/blob/495317fb823077178837a30319f9dfed7884ac1e/R/read_vcf.R#L60
Here's the reprex: