Closed bschilder closed 3 years ago
I've added more messages throughout to give the user a better sense of progress.
per 5., I've replaced the preview message function to use the following:
message(paste0(capture.output(preview), collapse = "\n"))
Which shows up really nicely in the console.
Added report_summary
function. Only used at the very end of format_sumstats
currently, but could be used at any stage (after colname standardization) in theory.
It'd be great to have progress bars wherever possible to let users know exactly what step the process it at and how long it might take. It always makes me nervous when i see something taking a long time and I don't know whether Rstudio has frozen or whether it's still chugging along.
More generally, it would be good to report not just how many SNPs were removed, but how many are remaining. This is ultimately what people care the most about.
Also, i think there's a couple points where the messages could be a bit more informative.
1. "VCF format detected, this will be converted to a standard summary statistics file format."
I would say, "...to a standardised table format."
2. 75 SNP IDs are not correctly formatted. These will be corrected from the reference genome.
"Correct" is a bit subjective here, best to clarify what you're doing, such as converting all SNPs to RSIDS from version y of X resource.
3. "940147 SNPs are not on the reference genome. These will be corrected from the reference genome."
Not sure what this means.
4. "7 RS IDs are duplicated in the sumstats file. These duplicates will be removed"
Are only the first instances of these duplicates removed, or all instances? If the former, are they selected based on the summary statistics (mean or max p-value?) or arbitrarily?
5. "Succesfully finished preparing sumstats file, preview:"
Minor thing, but the columns headers are not aligned with the data below in the preview. Makes it hard to read.
Also, typo: Succesfully --> Successfully