Closed bschilder closed 2 years ago
No this isn't something I thought to add but definitely sounds reasonable. This check should come after the small p-value check though: https://github.com/neurogenomics/MungeSumstats/blob/master/R/check_small_p_val.R as in certain instances the p column could be read in as a character field up to this point.
Do you want to add this to the branch you were working on? You should be able to copy the template from check_small_p_val()
(make sure to include the imputation indicator - line 49-51) and then add the warnings and a parameter to remove p>1 and less than 0 with a default of TRUE?
Sounds good, I was actually thinking the same thing. I'll work on that today.
Added the changes to the NEWS, but here's the part relevant to this Issue:
convert_large_p
and
convert_neg_p
, respectively.
These are both handled by the new internal function check_range_p_val
,
which also reports the number of SNPs found meeting these criteria
to the console/logs. check_small_p_val
records which SNPs were imputed in a more robust way,
by recording which SNPs met the criteria before making the changes (as opposed to inferred this info from which columns are 0 after making the changes). This
function now only handles non-negative p-values, so that rows with negative
p-values can be recorded/reported separately in the check_range_p_val
step. check_small_p_val
now reports the number of SNPs <= 5e-324 to console/logs. check_range_p_val
and check_small_p_val
. parse_logs
can now extract information reported by check_range_p_val
and
check_small_p_val
. logs_example
provides easy access to log file stored
in inst/extdata, and includes documentation on how it was created. check_range_p_val
and check_small_p_val
now use #' @inheritParams format_sumstats
to improve consistency of documentation.
After munging a bunch of summary stats from OpenGWAS, and trying to convert them to MAGMA files, I noticed MAGMA was throwing this error for a number of datasets, indicating that some values in the "P" col are >1 even after munging. Do we have any checks in place for this situation?
Sources
Possible reasons why this might occur in these cases:
Here's one of the problematic files: https://gwas.mrcieu.ac.uk/datasets/ubm-a-103/
I read the munged file in and got some more info on this P column. Based on this distribution, it definitely does seem like something is awry.
Solutions