Closed cfbeuchel closed 9 months ago
Hey!
I like the idea and happy to add in but two changes:
I wouldn't add a stop error if it isn't a dataframe like you have here:
stopifnot(
"Mapping file must be a data.frame!" =
!data.table::is.data.table(mapping_file)
)
So we already have checks to make sure the mapping_file is a data.frame (at least super class, DT's are a subset of DFs) here and we also check there are two columns so it is fair to assume that we can just convert to a DF. This won't affect anything that is already a DF. So just change that line to:
mapping_file <- as.data.frame(mapping_file)
The second change is checking if it is a custom mapping file, you have a check for this just based on the number of rows:
if(nrow(mapping_file) != nrow(sumstatsColHeaders)) {
message(
"Non-standard mapping file detected.",
"Making sure all entries in `Uncorrected` are in upper case.")
mapping_file$Uncorrected <- toupper(mapping_file$Uncorrected)
}
I'd rather check if they are the exact same:
if(!all.equal(mapping_file,sumstatsColHeaders)) {
Can you make these changes and then just have a look at the other PR you made and if you agree with my comment, revert that change in this PR too!
Cheers for your work on this and delighted you find the package useful! Alan.
Hi! I appreciate you suggestions. They are less ambiguous and I'll gladly implement them! Instead of reverting the changes here I'll just open a new PR from a new branch!
Hi! This is my second PR for the function
standardise_sumstats_column_headers_crossplatform()
. This PR includes the bugfix from the other PR plus some QOL improvements, at least for me. Feel free to merge either one (or none of course), whatever you think best.Just to explain: I am using a custom mapping file for
format_sumstats()
and ran into continuous errors due to being a scatterbrain and maybe some ambiguities in the code that could be avoided. These are:mapping_file$Uncorrected
must be in upper casePoint 1 is simply a problem when you, as I do, prefer to work in
data.table
, but a function explicitly requires a data.frame. Passing a data.table to the function will cause it to fail weirdly downstream, so I thought it useful to introduce a check. Something else might also work (maybeall(class(mapping_file) == "data.frame")
), but at least this way, my common error of submitting a DT is caught early.Point 2 is I think a more generally useful check because it is not mentioned anywhere explicitly. This is that the formatting check turns all column names to UPPERCASE. Therefore, entering the original column names (
myBadCol
or whatever) will not be corrected, because the function will only check forMYBADCOL
in mapping_file$Uncorrected. Therefore, I thought it useful to turn everything inmapping_file
to upper case when a custom table is supplied.Let me know what you think!
Again, thanks for the work on the package, you've saved me a lot of time and headache.
Cheers, Carl