IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Scripts that gives an error in R-Instat #8942

Open rdstern opened 6 months ago

rdstern commented 6 months ago

@lloyddewit the first one seems logical as perhaps something that hasn't yet been implemented. This is an introductrion to the impressive gt package. The script uses the md(_) function, which handles markdown. That gives an error.

I have given the whole script here. It is from here, where you go to the getting started tab. It also shows the results. The one use of md remaining is about line 50. There were more instances, (about line 100) but they have been commented out.

library(gt)
# Take the `islands` dataset and use some
# dplyr functionality to obtain the ten
# biggest islands in the world
islands_tbl <- 
  tibble(
    name = names(islands),
    size = islands
  ) |>
  arrange(desc(size)) |>
  slice(1:10)
  # Create a display table showing ten of
# the largest islands in the world
gt_tbl <- gt(islands_tbl)

# Show the gt Table
gt_tbl
# Make a display table with the `islands_tbl`
# table; put a heading just above the column labels
gt_tbl <- 
  gt_tbl |>
  tab_header(
    title = "Large Landmasses of the World",
    subtitle = "The top ten largest are presented"
  )

# Show the gt Table
gt_tbl
# Add footnotes (the same text) to two different
# cell; data cells are targeted with `data_cells()`
gt_tbl <- 
  gt_tbl |>
  tab_footnote(
    footnote = "The Americas.",
    locations = cells_body(columns = name, rows = 3:4)
  )

# Show the gt table
gt_tbl

# Determine the row that contains the
# largest landmass ('Asia')
largest <- 
  islands_tbl |> 
  arrange(desc(size)) |>
  slice(1) |>
  pull(name)

# Create two additional footnotes, using the
# `columns` and `where` arguments of `data_cells()`
gt_tbl <- 
  gt_tbl |>
  tab_footnote(
    footnote = md("The **largest** by area."),
    locations = cells_body(
      columns = size,
      rows = name == largest
    )
  ) |>
  tab_footnote(
    footnote = "The lowest by population.",
    locations = cells_body(
      columns = size,
      rows = size == min(size)
    )
  )

# Show the gt table
gt_tbl

# Create a gt table showing ten of the
# largest islands in the world; this
# time with a stub
gt_tbl <- 
  islands_tbl |>
  gt(rowname_col = "name")

# Show the gt table
gt_tbl

# Generate a simple table with a stub
# and add a stubhead label
gt_tbl <- 
  gt_tbl |>
  tab_stubhead(label = "landmass")

# Show the gt table
gt_tbl

# Display the `islands_tbl` data with a stub,
# a heading, source notes, and footnotes
gt_tbl <- 
  gt_tbl |>
  tab_header(
    title = "Large Landmasses of the World",
    subtitle = "The top ten largest are presented"
  ) |>
  tab_source_note(
    source_note = "Source: The World Almanac and Book of Facts, 1975, page 406."
  ) |>
  tab_source_note(
    #source_note = md("Reference: McNeil, D. R. (1977) *Interactive Data Analysis*. Wiley.")
    source_note = "Reference: McNeil, D. R. (1977) *Interactive Data Analysis*. Wiley."
  ) |>
  tab_footnote(
    #footnote = md("The **largest** by area."),
    footnote = "The **largest** by area.",
    locations = cells_body(
      columns = size, rows = largest
    )
  ) |>
  tab_footnote(
    footnote = "The lowest by population.",
    locations = cells_body(
      columns = size, rows = contains("arc")
    )
  )

# Show the gt table
gt_tbl

# Create three row groups with the
# `tab_row_group()` function
gt_tbl <- 
  gt_tbl |> 
  tab_row_group(
    label = "continent",
    rows = 1:6
  ) |>
  tab_row_group(
    label = "country",
    rows = c("Australia", "Greenland")
  ) |>
  tab_row_group(
    label = "subregion",
    rows = c("New Guinea", "Borneo")
  )

# Show the gt table
gt_tbl

# Modify the `airquality` dataset by adding the year
# of the measurements (1973) and limiting to 10 rows
airquality_m <- 
  airquality |>
  mutate(Year = 1973L) |>
  slice(1:10)

# Create a display table using the `airquality`
# dataset; arrange columns into groups
gt_tbl <- 
  gt(airquality_m) |>
  tab_header(
    title = "New York Air Quality Measurements",
    subtitle = "Daily measurements in New York City (May 1-10, 1973)"
  ) |>
  tab_spanner(
    label = "Time",
    columns = c(Year, Month, Day)
  ) |>
  tab_spanner(
    label = "Measurement",
    columns = c(Ozone, Solar.R, Wind, Temp)
  )

# Show the gt table
gt_tbl

# Move the time-based columns to the start of
# the column series; modify the column labels of
# the measurement-based columns
gt_tbl <- 
  gt_tbl |>
  cols_move_to_start(
    columns = c(Year, Month, Day)
  ) |>
  cols_label(
    Ozone = html("Ozone,<br>ppbV"),
    Solar.R = html("Solar R.,<br>cal/m<sup>2</sup>"),
    Wind = html("Wind,<br>mph"),
    Temp = html("Temp,<br>&deg;F")
  )

# Show the gt table
gt_tbl

The second is also from the gt package and is the one on clinical trials.. It defines and then tries to use a function. It seems to run, defining it, but gives an error when using it?

Here is the script:

custom_summary <- function(df, group_var, sum_var) {

  group_var <- rlang::ensym(group_var)
  sum_var <- rlang::ensym(sum_var)

  is_categorical <- 
    is.character(eval(expr(`$`(df, !!sum_var)))) |
    is.factor(eval(expr(`$`(df, !!sum_var)))) 

  if (is_categorical) {

    category_lbl <- 
      sprintf("%s, n (%%)", attr(eval(expr(`$`(df, !!sum_var))), "label"))

    df_out <-
      df |>
      dplyr::group_by(!!group_var)  |> 
      dplyr::mutate(N = dplyr::n()) |> 
      dplyr::ungroup() |> 
      dplyr::group_by(!!group_var, !!sum_var) |> 
      dplyr::summarize(
        val = dplyr::n(),
        pct = dplyr::n()/mean(N),
        .groups = "drop"
      ) |> 
      tidyr::pivot_wider(
        id_cols = !!sum_var, names_from = !!group_var,
        values_from = c(val, pct)
      ) |> 
      dplyr::rename(label = !!sum_var) |> 
      dplyr::mutate(
        across(where(is.numeric), ~ifelse(is.na(.), 0, .)),
        category = category_lbl
      )

  } else {

    category_lbl <-
      sprintf(
        "%s (%s)",
        attr(eval(expr(`$`(df, !!sum_var))), "label"),
        attr(eval(expr(`$`(df, !!sum_var))), "units")
      )

    df_out <- 
      df |> 
      dplyr::group_by(!!group_var) |> 
      dplyr::summarize(
        n = sum(!is.na(!!sum_var)),
        mean = mean(!!sum_var, na.rm = TRUE),
        sd = sd(!!sum_var, na.rm = TRUE),
        median = median(!!sum_var, na.rm = TRUE),
        min = min(!!sum_var, na.rm = TRUE),
        max = max(!!sum_var, na.rm = TRUE),
        min_max = NA,
        .groups = "drop"
      ) |> 
      tidyr::pivot_longer(
        cols = c(n, mean, median, min_max),
        names_to = "label",
        values_to = "val"
      ) |> 
      dplyr::mutate(
        sd = ifelse(label == "mean", sd, NA),
        max = ifelse(label == "min_max", max, NA),
        min = ifelse(label == "min_max", min, NA),
        label = dplyr::recode(
          label,
          "mean" = "Mean (SD)",
          "min_max" = "Min - Max",
          "median" = "Median"
        )
      ) |> 
      tidyr::pivot_wider(
        id_cols = label,
        names_from = !!group_var,
        values_from = c(val, sd, min, max)
      ) |> 
      dplyr::mutate(category = category_lbl)
  }

  return(df_out)
}

adsl_summary <- 
  dplyr::filter(rx_adsl, ITTFL == "Y") |> 
  (\(data) purrr::map_df(
    .x = dplyr::vars(AGE, AAGEGR1, SEX, ETHNIC, BLBMI),
    .f = \(x) custom_summary(df = data, group_var = TRTA, sum_var = !!x)
  ))()

The first part, which presumably produces the custom_summary function seems to run ok. The second part uses the custom_summary function and gives an error. I get the same strange error if I just type custom_summary.

I am not quite sure what it is doing here? It is @lilyclements sort of R, rather than mine!

lloyddewit commented 6 months ago

@rdstern Thank you for the scripts. If anyone finds a script that fails in R-Instat, then it's usually best to first test the script in RStudio. If it fails in RStudio then it's likely a script issue rather than an R-Instat issue. If it runs in RStudio but not in R-Instat, then it would be helpful to provide a screenshot of the R-Instat error(s). I tried to run the scripts in RStudio and they both failed. Normally, I would ask if you or @lilyclements could get these scripts to work in RStudio, and if they still failed in R-Instat, then I would investigate further.

However, I was curious and tried the scripts in R-Instat. Surprisingly, they ran better in R-Instat than RStudio. I assume that my R-Instat R environment must be slightly different to my RStudio R environment.

Script 1

I got the following error:

image

R-Instat is correct to report this error because the md function currently in scope requires a y parameter:

image

If we remove the md function then the whole of script 1 runs correctly.

image

So in summary, I don't think there's anything I need to fix in R-Instat for script 1.

Script 2

As you said, R-Instat defines the function correctly. However, it cannot parse the function call. Specifically it cannot parse the highlighted text below. @lilyclements Do you know what the highlighted parts do and specifically what \ does? Is this valid R that we need to be able to parse in R-Instat?

image

Thanks for your help

lloyddewit commented 6 months ago

@rdstern For the second script, was there also a potential issue copying/pasting from the website? For example ! is changed to !! and the pipe symbol looks strange:

image

rdstern commented 6 months ago

@lloyddewit thank you very much for the detective work. The md was the big one, because it happened regularly. This is now solved. And I should have guessed! a) Another clue that there was going to be an obvious solution was that I couldn't find evidence on the web that others had trouble with this function. b) The answer is that R-Instat (and our version of RStudio) must have another package that also has an md command and that we load before the gt package. I can't find which package it is. But - in hindsight of course - that is an obvious possibility.

One solution is to put gt::md( ), each time. But running md <- gt::md once is also sufficient. Then we can leave the resulting code as it is. That's ok (once you know) and is like having to include the library(gt)command in a script.