Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

p-values > 1 #76

Closed bschilder closed 2 years ago

bschilder commented 2 years ago

After munging a bunch of summary stats from OpenGWAS, and trying to convert them to MAGMA files, I noticed MAGMA was throwing this error for a number of datasets, indicating that some values in the "P" col are >1 even after munging. Do we have any checks in place for this situation?

Screenshot 2021-11-23 at 21 54 15

Sources

Possible reasons why this might occur in these cases:

Here's one of the problematic files: https://gwas.mrcieu.ac.uk/datasets/ubm-a-103/

I read the munged file in and got some more info on this P column. Based on this distribution, it definitely does seem like something is awry.

Screenshot 2021-11-23 at 21 46 58

histo

Solutions

Al-Murphy commented 2 years ago

No this isn't something I thought to add but definitely sounds reasonable. This check should come after the small p-value check though: https://github.com/neurogenomics/MungeSumstats/blob/master/R/check_small_p_val.R as in certain instances the p column could be read in as a character field up to this point.

Do you want to add this to the branch you were working on? You should be able to copy the template from check_small_p_val() (make sure to include the imputation indicator - line 49-51) and then add the warnings and a parameter to remove p>1 and less than 0 with a default of TRUE?

bschilder commented 2 years ago

Sounds good, I was actually thinking the same thing. I'll work on that today.

bschilder commented 2 years ago

Added the changes to the NEWS, but here's the part relevant to this Issue: