atorus-research / Tplyr

https://atorus-research.github.io/Tplyr/
Other
95 stars 16 forks source link

Nested count layers has unexpected sort variable behavior when using by variables #173

Closed mstackhouse closed 6 months ago

mstackhouse commented 7 months ago

Prerequisites

Bug exists in CRAN version 1.1.0, not resolved via #172 - this is new

Description

Nested count layers have the inner/outer layer consideration while processing data. When by variables are applied application of the ordlayer value for the inner layer seems to ignore the by variables. The outer layer row is supposed to get a value of Inf or -Inf, but the by variables themselves are being ignored.

When using alphabetical / factor levels, the build completes, but when the bycount sort method is used it leads to a recycle failure.

Steps to Reproduce (Bug Report Only)

adae <- tplyr_adae %>%
  head(300)

t_ae1 <- tplyr_table(adae, TRTA) %>%
  add_layer(
    group_count(vars(AEBODSYS, AEDECOD), by = vars(AESEV)) %>% 
      set_order_count_method("bycount") %>% 
      set_ordering_cols("Xanomeline High Dose") %>% 
      set_result_order_var(distinct_n)
  )

t_ae_df1 <- t_ae1 %>%
  build()

Error in `purrr::map()`:
ℹ In index: 1.
Caused by error in `[<-`:
! Assigned data `get_data_order_bycount(...)` must be compatible with row subscript `-1`.
✖ 117 rows must be assigned.
✖ Assigned data has 303 rows.
ℹ Only vectors of size 1 are recycled.
Caused by error in `vectbl_recycle_rhs_rows()`:
! Can't recycle input of size 303 to size 117.
Run `rlang::last_trace()` to see where the error occurred.

Expected behavior: [What you expected to happen]

The bycount values should be extracted from the distinct_n counts and applied properly. The multiple levels of the AEBODSYS outer layer row should collectively have a value of Inf

Actual behavior: [What actually happened]

Error thrown

Versions

You can get this information from executing sessionInfo().

mstackhouse commented 6 months ago

Closed via #174