atorus-research / Tplyr

https://atorus-research.github.io/Tplyr/
Other
95 stars 16 forks source link

Build of Tplyr AE table where an AEDECOD is assigned multiple values of AEBODSYS #146

Closed TYoungTradeCraft closed 7 months ago

TYoungTradeCraft commented 10 months ago

Prerequisites

Build of Tplyr AE table where an AEDECOD is assigned multiple values of AEBODSYS

Description

Tplyr build produces an error when attempting to build an AE table where a single AEDECOD is assigned to multiple AEBODSYS values, which is permitted in MedDRA

Steps to Reproduce

test_adae <- data.frame(
  SUBJID = c("1", "2", "3"),
  AEBODSYS = c("Nervous system disorders", "Vascular system disorders", "Vascular system disorders"),
  AEDECOD = c("Dizziness", "Dizziness", "other decod"),
  TRT101A = c("TRT1", "TRT2", "TRT1")
)

test_adsl <- data.frame(
  SUBJID = c("1", "2", "3"),
  TRT101A = c("TRT1", "TRT2", "TRT1")
)

t1 <- test_adae |> 
  Tplyr::tplyr_table(TRT101A) |>  
  Tplyr::add_total_group() |>
  Tplyr::set_pop_data(test_adsl) |>
  Tplyr::set_pop_treat_var(TRT101A) |>
  Tplyr::set_distinct_by(SUBJID) |>
  Tplyr::add_layer(
    group_count("Any adverse events")
  ) |> 
  Tplyr::add_layer(
    Tplyr::group_count(vars(AEBODSYS, AEDECOD)) |>
    Tplyr::set_nest_count(TRUE) |>
    Tplyr::set_indentation("&nbsp;&nbsp;&nbsp;&nbsp;") |> 
    Tplyr::set_order_count_method("bycount", break_ties = "asc") |> 
    Tplyr::set_ordering_cols(Total) |>
    Tplyr::set_result_order_var(distinct_n)
  )

table <- Tplyr::build(t1) 

**Expected behavior:** 

AE table built without error

**Actual behavior:** 
The below error occurs
Error in `purrr::map()`:
  ℹ In index: 2.
Caused by error in `[<-`:
  ! Assigned data `get_data_order_bycount(...)` must be compatible with row subscript `-1`.
✖ 1 row must be assigned.
✖ Assigned data has 2 rows.
ℹ Row updates require a list value. Do you need `list()` or `as.list()`?
  Caused by error in `vectbl_recycle_rhs_rows()`:
  ! Can't recycle input of size 2 to size 1.

Versions

Tplyr_1.1.0

mstackhouse commented 10 months ago

Here's a reduced example of the bug down to a reprex. The issue is with set_order_count_method() specifically for bycount with inner variable values that duplicate between outer variable values within a nested count layer. This is related to #75 as well.

test_adae <- data.frame(
  SUBJID = c("1", "2", "3"),
  AEBODSYS = c("Nervous system disorders", "Vascular system disorders", "Vascular system disorders"),
  AEDECOD = c("Dizziness", "Dizziness", "other decod"),
  TRT101A = c("TRT1", "TRT2", "TRT1")
)

test_adsl <- data.frame(
  SUBJID = c("1", "2", "3"),
  TRT101A = c("TRT1", "TRT2", "TRT1")
)

t1 <-  Tplyr::tplyr_table(test_adae, TRT101A) |>  
  Tplyr::add_layer(
    Tplyr::group_count(vars(AEBODSYS, AEDECOD)) |>
      Tplyr::set_order_count_method("bycount")
  )

table <- Tplyr::build(t1)
#> Error in `purrr::map()` at Tplyr/R/build.R:77:4:
#> ℹ In index: 1.
#> Caused by error in `[<-` at Tplyr/R/sort.R:745:6:
#> ! Assigned data `get_data_order_bycount(...)` must be compatible with
#>   row subscript `-1`.
#> ✖ 1 row must be assigned.
#> ✖ Assigned data has 2 rows.
#> ℹ Row updates require a list value. Do you need `list()` or `as.list()`?
#> Caused by error in `vectbl_recycle_rhs_rows()`:
#> ! Can't recycle input of size 2 to size 1.
#> Backtrace:
#>      ▆
#>   1. ├─Tplyr::build(t1)
#>   2. ├─Tplyr:::build.tplyr_table(t1) at Tplyr/R/build.R:51:4
#>   3. │ ├─base::tryCatch(...) at Tplyr/R/build.R:61:2
#>   4. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>   5. │ └─purrr::map(x$layers, process_formatting) at Tplyr/R/build.R:77:4
#>   6. │   └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
#>   7. │     ├─purrr:::with_indexed_errors(...)
#>   8. │     │ └─base::withCallingHandlers(...)
#>   9. │     ├─purrr:::call_with_cleanup(...)
#>  10. │     ├─Tplyr (local) .f(.x[[i]], ...)
#>  11. │     └─Tplyr:::process_formatting.count_layer(.x[[i]], ...) at Tplyr/R/build.R:145:2
#>  12. │       ├─Tplyr:::add_order_columns(x) at Tplyr/R/count.R:424:2
#>  13. │       └─Tplyr:::add_order_columns.count_layer(x) at Tplyr/R/sort.R:144:2
#>  14. │         └─base::evalq(...) at Tplyr/R/sort.R:162:2
#>  15. │           └─base::evalq(...)
#>  16. │             └─... %>% ungroup() at Tplyr/R/sort.R:209:6
#>  17. ├─dplyr::ungroup(.)
#>  18. ├─dplyr::do(...)
#>  19. ├─dplyr:::do.grouped_df(...)
#>  20. │ └─rlang::eval_tidy(args[[j]], mask)
#>  21. ├─Tplyr:::add_data_order_nested(...)
#>  22. │ ├─base::`[<-`(...) at Tplyr/R/sort.R:745:6
#>  23. │ └─tibble:::`[<-.tbl_df`(...) at Tplyr/R/sort.R:745:6
#>  24. │   └─tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
#>  25. │     └─tibble:::tbl_subassign_row(xj, i, value, i_arg, value_arg, call)
#>  26. │       ├─base::withCallingHandlers(...)
#>  27. │       └─tibble:::vectbl_assign(x[[j]], i, recycled_value[[j]])
#>  28. │         └─vctrs::vec_assign(x, i, value)
#>  29. ├─vctrs:::stop_recycle_incompatible_size(...)
#>  30. │ └─vctrs:::stop_vctrs(...)
#>  31. │   └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
#>  32. │     └─rlang:::signal_abort(cnd, .file)
#>  33. │       └─base::signalCondition(cnd)
#>  34. ├─tibble (local) `<fn>`(`<vctrs___>`)
#>  35. │ └─tibble:::vectbl_recycle_rhs_rows(value, length(i), i_arg, value_arg, call)
#>  36. │   ├─base::withCallingHandlers(...)
#>  37. │   └─vctrs::vec_recycle(value[[j]], nrow)
#>  38. └─vctrs:::stop_recycle_incompatible_size(...)
#>  39.   └─vctrs:::stop_vctrs(...)
#>  40.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)

Created on 2023-11-03 with reprex v2.0.2

asbates commented 8 months ago

Some notes for whoever works on this next.

The problem occurs when adding ord_layer_2. The group_data data frame has 1+ rows where the first row of ord_layer_2 is Inf. This is fine. But in calculating the rest of the ord_layer_2 row, get_data_order_bycount() returns a vector that is 1 too long. So if group_data is 2 rows, ord_layer_2 = c(Inf, NA) before get_data_order_bycount() is called. The result of this call is supposed to replace the 2nd row of group_data, so ord_layer_2 would become c(Inf, 1) for example. But, get_data_order_bycount() is returning something like c(1, 0). This length 2 vector is trying to replace the length 1 vector of group_data[-1, "ord_layer_2"]. That is, everything but the first row of ord_layer_2. Hence the error message Can't recycle input of size 2 to size 1

Replacing the line referenced above by the below code seems to do the job for the reprex problem. But I don't really understand if it's correct or not.

      data_order <- get_data_order_bycount(filtered_numeric_data,
                             ordering_cols,
                             treat_var,
                             head(by, -1),
                             cols,
                             result_order_var,
                             target_var,
                             break_ties = break_ties,
                             numeric_cutoff = numeric_cutoff,
                             numeric_cutoff_stat = numeric_cutoff_stat,
                             numeric_cutoff_column = numeric_cutoff_column,
                             nested = TRUE)
      n_data_order <- length(data_order)
      n_row_group_data_no_inf <- nrow(group_data) - 1 # don't count Inf/-Inf because that's not being replaced

      if (n_row_group_data_no_inf < n_data_order) {
        group_data[-1, paste0("ord_layer_", final_col + 1)] <- data_order[1:n_row_group_data_no_inf]
      } else {
        group_data[-1, paste0("ord_layer_", final_col + 1)] <- data_order
      }
mstackhouse commented 7 months ago

@TYoungTradeCraft if you want to pull this branch, I think I pushed a valid fix for it.

mstackhouse commented 7 months ago

Moved into devel now and closed with #172

TYoungTradeCraft commented 7 months ago

Thanks Mike, this did resolve my issue, but I first had to update some of calls to Tplyr to make my app run.

I had to add Tplyer:: to the below calls to get my app to run wth the devel branch

      Tplyr::set_ordering_cols

      Tplyr::group_shift

I have a relatively large app with many other calls to Tplyr where the majority of the calls currently do not include the explicit package name. Do you recommend always using the package name when calling Tplyr?

-Tim


From: Michael Stackhouse @.> Sent: Friday, January 26, 2024 2:32 PM To: atorus-research/Tplyr @.> Cc: Timothy Young @.>; Mention @.> Subject: Re: [atorus-research/Tplyr] Build of Tplyr AE table where an AEDECOD is assigned multiple values of AEBODSYS (Issue #146)

@TYoungTradeCrafthttps://github.com/TYoungTradeCraft if you want to pull this branch, I think I pushed a valid fix for it.

— Reply to this email directly, view it on GitHubhttps://github.com/atorus-research/Tplyr/issues/146#issuecomment-1912585851, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5HCR2KRNVUPMRA7MVENDMLYQQAGDAVCNFSM6AAAAAA64NTYHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSGU4DKOBVGE. You are receiving this because you were mentioned.

mstackhouse commented 7 months ago

@TYoungTradeCraft you shouldn't ever have to explicitly call the package name so that's interesting. Can you provide a reproducible example of creating this bug so I can investigate?

TYoungTradeCraft commented 7 months ago

When I run from the I get the below error when I remove “Tplyer::” from the call to set_ordering_cols in the below code

     Warning: Error in set_ordering_cols: could not find function "set_ordering_cols"

I installed the develop branch of Tplyer with the below command. Should this be done differntly?

@.***")

t1 <- adae |>

    tplyr_table(TRT01A) |>

    add_total_group() |>

    set_pop_data(adsl) |>

    set_pop_treat_var(TRT01A) |>

    set_distinct_by(SUBJID) |>

    add_layer(

      group_count("Any adverse events")

    ) |>

    add_layer(

      group_count(vars(AEBODSYS, AEDECOD)) |>

        set_nest_count(TRUE) |>

        set_order_count_method("bycount", break_ties = "asc") |>

        Tplyr::set_ordering_cols(Total) |>

        set_result_order_var(distinct_n)

    )

From: Michael Stackhouse @.> Sent: Friday, February 2, 2024 11:00 AM To: atorus-research/Tplyr @.> Cc: Timothy Young @.>; Mention @.> Subject: Re: [atorus-research/Tplyr] Build of Tplyr AE table where an AEDECOD is assigned multiple values of AEBODSYS (Issue #146)

@TYoungTradeCrafthttps://github.com/TYoungTradeCraft you shouldn't ever have to explicitly call the package name so that's interesting. Can you provide a reproducible example of creating this bug so I can investigate?

— Reply to this email directly, view it on GitHubhttps://github.com/atorus-research/Tplyr/issues/146#issuecomment-1924163410, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5HCR2LIKZI6VBQ7MBRRBEDYRUEQ7AVCNFSM6AAAAAA64NTYHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRUGE3DGNBRGA. You are receiving this because you were mentioned.Message ID: @.***>

mstackhouse commented 7 months ago

I'm not able to replicate this error in open code. I recommend you try to reinstall the package:

devtools::install_github("https://github.com/atorus-research/Tplyr.git", ref="devel")