Merck / r2rtf

Easily Create Production-Ready Rich Text Format (RTF) Table and Figure
https://merck.github.io/r2rtf
GNU General Public License v3.0
76 stars 20 forks source link

Adding addtional sorting variables for the generation of listing #233

Closed Rednose22 closed 1 month ago

Rednose22 commented 1 month ago
  1. Currently, r2rtf::rtf_body() will check if the data is sorted by the combination of subline_by, page_by, and group_by, if not, it will break the code. But especially in the listing, the sorting of listing can't be consistent to the combination of these three variables. Is it possible to add one more argument sort_by in the rtf_body(), and if we define the sort_by then we can suppress this bulletproof or any other approaches?
    
    # Load necessary packages
    library(r2rtf)

Create the dummy data frame

dummy_data <- data.frame( STUDYID = c(101, 101, 101, 101, 102, 102, 102, 103, 103, 104, 104), COUNTRY = c("USA", "USA", "CAN", "CAN", "GER", "GER", "GER", "FRA", "FRA", "JPN", "JPN"), SITENUM = c(1001, 1001, 1002, 1002, 1003, 1003, 1003, 1004, 1004, 1005, 1005), SUBJID = c("0001", "0002", "0001", "0002", "0001", "0002", "0003", "0001", "0002", "0001", "0002"), USUBJID = c("101-1001-0001", "101-1001-0002", "101-1002-0001", "101-1002-0002", "102-1003-0001", "102-1003-0002", "102-1003-0003", "103-1004-0001", "103-1004-0002", "104-1005-0001", "104-1005-0002"), DVCAT = c("Safety", "Efficacy", "Safety", "Efficacy", "Safety", "Efficacy", "Safety", "Efficacy", "Safety", "Efficacy", "Safety"), DVTERM = c("Headache", "Nausea", "Dizziness", "Vomiting", "Fatigue", "Rash", "Insomnia", "Anxiety", "Headache", "Vomiting", "Nausea"), DVSPID = c("001", "002", "003", "004", "005", "006", "007", "008", "009", "010", "011"), IMPORTANT = c("Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No", "Yes"), stringsAsFactors = FALSE )

Define metadata for r2rtf

sort_by <- c("IMPORTANT", "STUDYID", "SITENUM", "COUNTRY", "SUBJID") group_by <- c("STUDYID", "COUNTRY", "SITENUM", "USUBJID", "DVCAT", "DVTERM") orientation <- 'landscape' page_size <- 16 title <- "Example of r2rtf" colheader <- "Trial Number | Country | Site Number | Subject ID | Unique Subject Identifier | Deviation Category | Protocol Deviation Description | Protocol Deviation ID | Clinically Important" rel_width <- c(14, 14, 12, 15, 20, 25, 62, 20, 15)

Order data by sort_by

dummy_data <- dummy_data[do.call(order, dummy_data[sort_by]), ]

Create RTF document with r2rtf

rtf <- dummy_data |> r2rtf::rtf_page(orientation = orientation, nrow = page_size) |> r2rtf::rtf_title(title) |> r2rtf::rtf_colheader( colheader, col_rel_width = rel_width, cell_vertical_justification = "top" ) |> r2rtf::rtf_body( col_rel_width = rel_width, text_convert = FALSE, group_by = group_by )

> Error in r2rtf::rtf_body(r2rtf::rtf_colheader(r2rtf::rtf_title(r2rtf::rtf_page(dummy_data, : Data is not sorted by STUDYID, COUNTRY, SITENUM, USUBJID, DVCAT, DVTERM


<sup>Created on 2024-10-02 with [reprex v2.1.0](https://reprex.tidyverse.org)</sup>

2. Another issue related to sorting in the listing, in SAS, we could add option `noprint` to not show this variable but only for sorting, like `sitenum` in the example below:
```sas
proc report;
column sitenum trta usubjid;
define sitenum/order noprint;
define trta/order;
define usubjid/display;
run;

But it's not applicable in the r2rtf.

elong0527 commented 1 month ago

r2rtf is designed to only handle table format with a data frame as is.

The requested features should be handled at data manipulation stage. So a pipe can be crated to first manipulate data using tidyverse or other approach.

For item 1, please use 'arrange' function in 'dplyer' other variables as needed.

For item 2 you can sort the variables and remove variables using 'select' function in 'dplyr' or other approach.

Rednose22 commented 1 month ago

r2rtf is designed to only handle table format with a data frame as is.

The requested features should be handled at data manipulation stage. So a pipe can be crated to first manipulate data using tidyverse or other approach.

For item 1, please use 'arrange' function in 'dplyer' other variables as needed.

For item 2 you can sort the variables and remove variables using 'select' function in 'dplyr' or other approach.

Hi @elong0527, thanks for your quick reply. Regarding your suggestion, I understand r2rtf only focuses on converting the data frame as it is to rtf. But the issue here is in the line 189-191 of rtf_body(), it will check if the sorting of data is consistent to the sorting by the the combination of subline_by, page_by, and group_by. If it's not, the code will be stopped. But in the context of listing, it always has some cases that the sorting of data is not consistent to the group_by (In the example above, subline_by and page_by are NULL).

Also, for your suggestion of item 2, it's same because if I sort the data frame with extra variables and remove them, the sorting of data will still be checked in rtf_body() compared to the sorting by subline_by, page_by, and group_by. And it would still potentially break the code.

elong0527 commented 1 month ago

In the example, it seems you want to have listings to separate "clinical important" and "clinical not important". You may want to separate the results into two listings with proper titles.

sort_by <- c("IMPORTANT", "STUDYID", "SITENUM", "COUNTRY", "SUBJID")
group_by <- c("STUDYID", "COUNTRY", "SITENUM", "USUBJID", "DVCAT", "DVTERM")

If you need to create a listing exactly like you suggest, group_by did not fit for your purpose. One way you can do is to manipulate the data frame by replacing repeated values as NA.

Rednose22 commented 1 month ago

In the example, it seems you want to have listings to separate "clinical important" and "clinical not important". You may want to separate the results into two listings with proper titles.

sort_by <- c("IMPORTANT", "STUDYID", "SITENUM", "COUNTRY", "SUBJID")
group_by <- c("STUDYID", "COUNTRY", "SITENUM", "USUBJID", "DVCAT", "DVTERM")

If you need to create a listing exactly like you suggest, group_by did not fit for your purpose. One way you can do is to manipulate the data frame by replacing repeated values as NA.

Yes, 'clinical important' records need to be showed in the listing first and then 'non clinical important' records in one listing and there're more mockups with same issue in the real project. It would be a great new feature if it could be enhanced in r2rtf since it's not very straightforward to manipulate the data frame by replacing repeated values as NA instead of using group_by argument.

elong0527 commented 1 month ago

Could you provide a screenshot of the RTF output based on the dummy_data data defined above?

Rednose22 commented 1 month ago

Could you provide a screenshot of the RTF output based on the dummy_data data defined above?

Thanks a lot @elong0527

image
elong0527 commented 1 month ago

Here is code example to manipulate the data and create the exact table in the screenshot.

library(r2rtf)

# Create the dummy data frame
dummy_data <- data.frame(
  STUDYID = c(101, 101, 101, 101, 102, 102, 102, 103, 103, 104, 104),
  COUNTRY = c("USA", "USA", "CAN", "CAN", "GER", "GER", "GER", "FRA", "FRA", "JPN", "JPN"),
  SITENUM = c(1001, 1001, 1002, 1002, 1003, 1003, 1003, 1004, 1004, 1005, 1005),
  SUBJID = c("0001", "0002", "0001", "0002", "0001", "0002", "0003", "0001", "0002", "0001", "0002"),
  USUBJID = c("101-1001-0001", "101-1001-0002", "101-1002-0001", "101-1002-0002", "102-1003-0001", 
              "102-1003-0002", "102-1003-0003", "103-1004-0001", "103-1004-0002", "104-1005-0001", 
              "104-1005-0002"),
  DVCAT = c("Safety", "Efficacy", "Safety", "Efficacy", "Safety", "Efficacy", "Safety", "Efficacy", 
            "Safety", "Efficacy", "Safety"),
  DVTERM = c("Headache", "Nausea", "Dizziness", "Vomiting", "Fatigue", "Rash", "Insomnia", "Anxiety", 
             "Headache", "Vomiting", "Nausea"),
  DVSPID = c("001", "002", "003", "004", "005", "006", "007", "008", "009", "010", "011"),
  IMPORTANT = c("Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No", "Yes"),
  stringsAsFactors = FALSE
)

df <- dummy_data %>% 
  arrange(desc(IMPORTANT), STUDYID, SITENUM, COUNTRY, SUBJID) %>%
  mutate(
    across(c("STUDYID", "COUNTRY", "SITENUM", "USUBJID"), function(x){
      if_else( c(FALSE, x[-n()] == c(x[-1])), NA, x)
    })
  )

orientation <- 'landscape'
page_size <- 16
title <- "Example of r2rtf"
colheader <- "Trial Number | Country | Site Number | Subject ID | Unique Subject Identifier | Deviation Category | Protocol Deviation Description | Protocol Deviation ID | Clinically Important"
rel_width <- c(14, 14, 12, 15, 20, 25, 62, 20, 15)

# Create RTF document with r2rtf
rtf <- df |>
  r2rtf::rtf_page(orientation = orientation,
                  nrow = page_size) |>
  r2rtf::rtf_title(title) |>
  r2rtf::rtf_colheader(
    colheader,
    col_rel_width = rel_width,
    cell_vertical_justification = "top"
  ) |>
  r2rtf::rtf_body(
    col_rel_width = rel_width,
    text_convert = FALSE
  )

rtf |>
  rtf_encode() |>
  write_rtf("tmp.rtf")