Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Add progress indicators / improve messages #4

Closed bschilder closed 3 years ago

bschilder commented 3 years ago

It'd be great to have progress bars wherever possible to let users know exactly what step the process it at and how long it might take. It always makes me nervous when i see something taking a long time and I don't know whether Rstudio has frozen or whether it's still chugging along.

More generally, it would be good to report not just how many SNPs were removed, but how many are remaining. This is ultimately what people care the most about.

Also, i think there's a couple points where the messages could be a bit more informative.

1. "VCF format detected, this will be converted to a standard summary statistics file format."

I would say, "...to a standardised table format."

2. 75 SNP IDs are not correctly formatted. These will be corrected from the reference genome.

"Correct" is a bit subjective here, best to clarify what you're doing, such as converting all SNPs to RSIDS from version y of X resource.

3. "940147 SNPs are not on the reference genome. These will be corrected from the reference genome."

Not sure what this means.

4. "7 RS IDs are duplicated in the sumstats file. These duplicates will be removed"

Are only the first instances of these duplicates removed, or all instances? If the former, are they selected based on the summary statistics (mean or max p-value?) or arbitrarily?

5. "Succesfully finished preparing sumstats file, preview:"

Minor thing, but the columns headers are not aligned with the data below in the preview. Makes it hard to read.

Also, typo: Succesfully --> Successfully

bschilder commented 3 years ago

I've added more messages throughout to give the user a better sense of progress.

per 5., I've replaced the preview message function to use the following:

  message(paste0(capture.output(preview), collapse = "\n"))

Which shows up really nicely in the console. Screenshot 2021-07-11 at 12 32 11

bschilder commented 3 years ago

Added report_summary function. Only used at the very end of format_sumstats currently, but could be used at any stage (after colname standardization) in theory.