CHOP-CGTInformatics / REDCapTidieR

Makes it easy to read REDCap Projects into R
https://chop-cgtinformatics.github.io/REDCapTidieR/
Other
33 stars 8 forks source link

Xlsx export capabilities #148

Closed rsh52 closed 1 year ago

rsh52 commented 1 year ago

Description

This PR looks to introduce a new function, write_supertibble_xlsx (name suggestions welcome), which exports redcap_data elements of a supertibble output to a XLSX document where each tibble is captured on a separate sheet.

The function allows for the following adjustable parameters:

Proposed Changes

List changes below in bullet format:

Examples

Example Supertibble Default Without Labelled ```r redcap_uri <- Sys.getenv("REDCAP_URI") token <- Sys.getenv("REDCAPTIDIER_CLASSIC_API") out <- read_redcap(redcap_uri = redcap_uri, token = token) out %>% write_supertibble_xlsx(labelled = FALSE, file = "temp.xlsx") ```

image

Example Default With Labelled ```r out_labelled <- out %>% make_labelled() out_labelled %>% write_supertibble_xlsx(labelled = TRUE, labelled_sheets = TRUE, incl_supertbl = TRUE, incl_meta = TRUE, file = "temp.xlsx") ```

image

Issue Addressed

Closes #143

Additional Considerations and Notes

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

rsh52 commented 1 year ago

I think I have most of the requests implemented in some fashion. The two I wasn't able to/haven't gotten to were:

ezraporter commented 1 year ago

I personally think it's fine to say that if the user has labelled = NULL we only try to work with labels if is_installed("labelled") == TRUE.

To clarify, the point of this note was to get out of the infinite loop of using labelled::look_for() in the code to check if labels exist.

rsh52 commented 1 year ago

Ok! Everything(ish) should be implemented. Some notes below:

One thing to note, I've tested write_redcap_xlsx against various REDCapTidieR test databases, Prodigy, and BMT Outcomes. The output XLSX files open without warnings for labelled/unlabelled outputs for everything except the REDCapTidieR classic database. When reviewing the log from Excel, it's something related to the Text Input Validation Types instrument/sheet. I compared the underlying XML of the problem sheet based on the log output within the openxlsx2 workbook object and couldn't find any differences using waldo::compare or on visual inspection. From what I can tell nothing seems altered either on letting Excel do it's repair operation.

JanMarvin commented 1 year ago

Hi @rsh52 , openxlsx2 developer here (came across this PR via github search). First of all, thanks for considering openxlsx2. I've opened a pull request that allows adding custom table styles so that you can have a red one again in the future and just pushed a commit that exports wb_set_bookview(). It was previously overlooked, you could use the chained function wb$set_bookview() with our current CRAN release. With the next release the following should be possible:

remotes::install_github("JanMarvin/openxlsx2")
#> Downloading GitHub repo JanMarvin/openxlsx2@HEAD
#> 
#> ── R CMD build ─────────────────────────────────────────────────────────────────
#> * checking for file ‘/private/var/folders/p7/t9znbbgj2_ndlz6vbxmgh7zm0000gn/T/RtmpwMKC6V/remotes91b973612188/JanMarvin-openxlsx2-1fb2701/DESCRIPTION’ ... OK
#> * preparing ‘openxlsx2’:
#> * checking DESCRIPTION meta-information ... OK
#> * cleaning src
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * building ‘openxlsx2_0.6.1.9000.tar.gz’
library(openxlsx2)
wb <- wb_workbook() %>%
  wb_add_worksheet() %>% 
  wb_set_bookview(windowHeight = 130000, windowWidth = 6000)
rsh52 commented 1 year ago

Hi @rsh52 , openxlsx2 developer here (came across this PR via github search). First of all, thanks for considering openxlsx2. I've opened a pull request that allows adding custom table styles so that you can have a red one again in the future and just pushed a commit that exports wb_set_bookview(). It was previously overlooked, you could use the chained function wb$set_bookview() with our current CRAN release. With the next release the following should be possible:

Hi @JanMarvin! Thanks so much for reaching out and working on this. The current version of wb$set_bookview will actually work perfectly with what we have now, I must not have been setting the appropriate window height and widths when I was testing it before. We'll be on the lookout for whenever the newest version of openxlsx2 gets published to CRAN for the red styling.

Great work on the package all in all, it's very easy to use and will be very helpful for our users!

JanMarvin commented 1 year ago

Glad you like the package! The window screen size values are something I haven't really figured out yet. The values I've given are simply large enough to produce a full screen window on the vast majority of screens. Unfortunately, there is no general value for fullscreen or x percent of window, and yes, documentation is a curse. Always harder to write than the actual code 😄.

PS: I noticed the green flags for the percentages in the screenshot above. This is probably due to some hard coded percent signs in the data:

# perpare prop.table
df <- as.data.frame(prop.table(as.matrix(mtcars), margin = 2))
for (i in names(df)) class(df[[i]]) <- c("numeric", "percentage")

library(openxlsx2)
wb <- wb_workbook()$add_worksheet()$add_data(x = df)

PPS: You mentioned a warning due to broken XML code. If you want and it is allowed, you can send me a broken file and I can take a look at it. Unfortunately the openxml file format is not always forgiving and it is all too easy to break something.

rsh52 commented 1 year ago

Glad you like the package! The window screen size values are something I haven't really figured out yet. The values I've given are simply large enough to produce a full screen window on the vast majority of screens. Unfortunately, there is no general value for fullscreen or x percent of window, and yes, documentation is a curse. Always harder to write than the actual code 😄.

PS: I noticed the green flags for the percentages in the screenshot above. This is probably due to some hard coded percent signs in the data:

# perpare prop.table
df <- as.data.frame(prop.table(as.matrix(mtcars), margin = 2))
for (i in names(df)) class(df[[i]]) <- c("numeric", "percentage")

library(openxlsx2)
wb <- wb_workbook()$add_worksheet()$add_data(x = df)

PPS: You mentioned a warning due to broken XML code. If you want and it is allowed, you can send me a broken file and I can take a look at it. Unfortunately the openxml file format is not always forgiving and it is all too easy to break something.

Ah, great catch! I will work on implementing a solution for this, thank you.

I also sent you an e-mail to the one on your GitHub page with the broken XLSX files. Any insights appreciated. As noted there, one thing that jumped out at me was that using similar code to what you have in #594 works fine:

wb %>%
  wb_add_data_table(x = mtcars, tableStyle = "RedTableStyle") %>%
  wb_open()

But the file turns out corrupted when changed to:

wb <- wb_workbook() %>%
  wb_add_worksheet()

wb <- wb %>%
  wb_add_data_table(x = mtcars)

wb$save("temp.xlsx")
rsh52 commented 1 year ago

@skadauke @ezraporter Implemented a simple quick fix in the most recent commit for the supertbl TOC output as suggested by Jan.

If you can think of any other edge cases that might trigger this green tick related to "Number stored as text" flags let me know, but from a brief review it should be pretty rare. In our test databases it appears because we assign things like numbers to text fields, but generally this shouldn't happen in most databases.

Another thing to consider that Ezra brought up is changing the default for missing values from "#NA" to something else, perhaps user-specified. Maybe @JanMarvin has an openxlsx2 suggestion? 😄 Could openxlsx2::na_strings be of any use?

JanMarvin commented 1 year ago

Another thing to consider that Ezra brought up is changing the default for missing values from "#NA" to something else, perhaps user-specified. Maybe @JanMarvin has an openxlsx2 suggestion? 😄 Could openxlsx2::na_strings be of any use?

Sure, you can use wb_add_data(na.strings = NULL) to disable NAs or use any string.

skadauke commented 1 year ago

Should the "N/A" be specially formatted, so people know that it's not the string "N/A"? Like italic? Not sure how hard that is to do.


From: Ezra Porter @.> Sent: Wednesday, May 3, 2023 5:05 PM To: CHOP-CGTInformatics/REDCapTidieR @.> Cc: Stephan Kadauke @.>; Mention @.> Subject: [External]Re: [CHOP-CGTInformatics/REDCapTidieR] Xlsx export capabilities (PR #148)

@ezraporter commented on this pull request.


In R/write.Rhttps://github.com/CHOP-CGTInformatics/REDCapTidieR/pull/148#discussion_r1184281720:

' supertibble redcap_metadata. Default TRUE.

-#' @param tableStyle Any excel table style name or "none" (see "formatting" -#' vignette in \link[openxlsx]{writeDataTable}). Default "TableStyleLight10". +#' @param table_style Any excel table style name or "none" (see "formatting" +#' vignette in \link[openxlsx2]{wb_add_data_table}). Default "tableStyleLight8". +#' @param set_col_widths Width to set columns across the workbook. Default +#' "auto", otherwise a numeric value. Standard Excel is 8.43. +#' @param recode_yn Convert REDCap yesno questions from TRUE/FALSE to "yes"/"no" +#' for readability. Default TRUE. +#' @param na_strings Value used for replacing NA values from x. Default +#' na_strings() uses the special ⁠#N/A⁠ value within the workbook.

Changing this to "N/A"

— Reply to this email directly, view it on GitHubhttps://github.com/CHOP-CGTInformatics/REDCapTidieR/pull/148#discussion_r1184281720, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACTGHWUA3BC22VWOLXT7LCTXELCAJANCNFSM6AAAAAAXHG3LII. You are receiving this because you were mentioned.Message ID: @.***>

This email originated from an EXTERNAL sender to CHOP. Proceed with caution when replying, opening attachments, or clicking links. Do not disclose your CHOP credentials, employee information, or protected health information to a potential hacker.

ezraporter commented 1 year ago

Should the "N/A" be specially formatted, so people know that it's not the string "N/A"? Like italic? Not sure how hard that is to do.

After playing with this a bit I think we're best served setting the NA values to "" (blank) and not worrying about custom formatting. Here's why:

  1. The blanks look really good! Clean, uncluttered, as a user I wouldn't want anything else.
  2. Giving an option for custom formatting (in a smart way) would require us to complicate the API by allowing users to pass an object to na_string that specifies the formatting. That complication for this feature doesn't seem worth it to me.
rsh52 commented 1 year ago

Merging this since it's getting a bit large and so we can begin working on vignette documentation in preparation for v0.4's release.