Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

GenomicRanges style update broken for X chromosome #103

Closed quattro closed 2 years ago

quattro commented 2 years ago

Hi all,

Thanks for the super cool tool. It's made my life a lot easier. I ran into a bug recently that took some internal messing around to figure out what was going on during liftover from GRCh37->GRCh38.

Internally the code pushes all CHR to lowercase (see here), which in turn breaks the style requirements for UCSC. It would be great to have this updated to work for all chromosome data, and not just autosomes.

Perhaps instead of a blanket tolower call maybe use a gsub that explicitly replaces any 'chr' instance (regardless of case) with 'chr'.

        rsids[, CHR := gsub("CHR", "chr", as.character(CHR), ignore.case=TRUE)]
        sumstats_dt[, CHR := gsub("CHR", "chr", as.character(CHR), ignore.case=TRUE)]

Cheers, Nick

Al-Murphy commented 2 years ago

Hey,

Great to hear that you find the package useful! Happy to implement a change on this, just to confirm is the issue that sex chromosomes are being pushed to lower chase e.g. X -> x. I believe you are right and that UCSC uses 'X' rather than 'x'.

Let me know if this is what you mean and I'll push a fix.

Cheers.

quattro commented 2 years ago

Hi @Al-Murphy , yes that's correct. I tested out my above suggestion and it appears to fix the issue for now.

Al-Murphy commented 2 years ago

Fix pushed to RELEASE_3_15 and will be available in Bioconductor in the next few days

quattro commented 2 years ago

Fantastic, thanks so much.