Closed rmgpanw closed 7 months ago
This same issue was discussed in the later half of here. I agree this is a sensible idea and can be implemented as follows:
Add a parameter where the user can list the columns they want tested for missing values. By default this will be all SNP (SNP, BP, A1, A2), effect columns (BETA, Z etc) and other more essential columns (FRQ, N). This parameter would be an easy modification to check_miss_data.R which is called in format_sumstats.R (so the parameter would need to be added to that function too).
Unfortunately, I don't have time to implement this myself now, if you have the time to add this functionality and submit a PR, I'll happily review and add it. I think it could be very useful too.
Thanks for getting back to me so quickly, and with clear instructions. I will aim to give this a go later this week (hopefully) and keep you posted.
1. Bug description
Any column that contains
NA
values will cause those rows to be removed, even if the column is not necessaryExpected behaviour
Remove rows with missing values, but only if those missing values are in essential columns (e.g. "CHR", "POS", "BETA" etc)
2. Reproducible example
Dummy data, including an "EXTRA" column containing
NA
s:All rows are removed by
format_sumstats()
due to theNA
values, even though this "EXTRA" column is not needed:Works as expected as long as "EXTRA" column is removed:
Created on 2024-04-22 with reprex v2.0.2
3. Session info