CHOP-CGTInformatics / REDCapTidieR

Makes it easy to read REDCap Projects into R
https://chop-cgtinformatics.github.io/REDCapTidieR/
Other
32 stars 8 forks source link

[BUG] Data loss for repeating together events when instruments are additionally mapped to other event types #200

Closed ezraporter closed 1 week ago

ezraporter commented 2 weeks ago

Expected Behavior

All data recorded in the REDCap project should appear in the supertibble

Current Behavior

When an instrument is mapped to an event that repeats together as well as events that are non-repeating or repeat separately, the data from the repeating together event is not included in the supertibble.

How to Reproduce the Bug:

See the Instrument Types REDCap

Screenshot 2024-08-28 at 3 26 45 PM
library(REDCapTidieR)

data <- read_redcap(Sys.getenv("REDCAP_URI"), "77C2C3A96E00246436A9E49974BD5674", allow_mixed_structure = TRUE)

data |>
  extract_tibble("rs_rt") |>
  nrow()

#> [1] 1
# Should be 2

data |>
  extract_tibble("nr_rs_rt") |>
  nrow()

#> [1] 2
# Should be 3

Checklist

Before submitting this issue, please check and verify below that the submission meets the below criteria:

rsh52 commented 2 weeks ago

I believe this is occurring here in clean_redcap_long() -> convert_mixed_instrument():

https://github.com/CHOP-CGTInformatics/REDCapTidieR/blob/7d388e945c0fd69c16872fd2654c5edeb4cfff3b/R/clean_redcap_long.R#L405-L406

I think in the current logic we only check for when an identified repeat field (nr_rs_rt) is not NA and only that the repeat instance is NA, but there is also in this example when the repeat field is not NA, the repeat instance is not NA but the repeat instrument is NA indicating a repeating event. I.e. we check for nonrepeating with separate repeating, but we don't check for repeating together.

So from the sparse matrix output, we capture types 1 and 2 below but not 3:

image

Same with extracting rs_rt, we're missing capturing behavior of "repeating together."

ezraporter commented 2 weeks ago

I see so the issue is that this step should have populated redcap_repeat_instrument for that bottom row? Then the fix is something like:

update_mask <- (is.na(db_data_long$redcap_repeat_instance) | is.na(db_data_long$redcap_repeat_instrument)) & !is.na(db_data_long[[field]])

Right?

rsh52 commented 2 weeks ago

I see so the issue is that this step should have populated redcap_repeat_instrument for that bottom row? Then the fix is something like:

update_mask <- (is.na(db_data_long$redcap_repeat_instance) | is.na(db_data_long$redcap_repeat_instrument)) & !is.na(db_data_long[[field]])

Right?

Close, on testing this may need consideration for when there's multiple repeat-together event instances. I just added another repeating-together instance to record 1 in the test database, with the above suggestion which results in:

> data |>
+   extract_tibble("rs_rt")
# A tibble: 3 × 5
  record_id redcap_event       redcap_form_instance rs_rt form_status_complete
      <dbl> <chr>                             <dbl> <dbl> <fct>               
1         1 repeating_separate                    1     1 Incomplete          
2         1 repeating_together                    1     1 Incomplete          
3         1 repeating_together                    1     2 Incomplete          
> data |>
+   extract_tibble("nr_rs_rt")
# A tibble: 4 × 5
  record_id redcap_event       redcap_form_instance nr_rs_rt form_status_complete
      <dbl> <chr>                             <dbl>    <dbl> <fct>               
1         1 non_repeating                         1        1 Incomplete          
2         1 repeating_separate                    1        1 Incomplete          
3         1 repeating_together                    1        1 Incomplete          
4         1 repeating_together                    1        2 Incomplete 

It captures all the expected data, but the redcap_form_instance is misleading. Originally we intended to always have a repeat instance show as a pseudo-repeating instrument with a single value, which the next line of code is responsible for:

https://github.com/CHOP-CGTInformatics/REDCapTidieR/blob/7d388e945c0fd69c16872fd2654c5edeb4cfff3b/R/clean_redcap_long.R#L408-L409

But I think what we want here is for redcap_form_instance to be 2 where rs_rt and nr_rs_rt are 2. However, I wonder if this is starting to muddy the meaning of redcap_form_instance in relation to actual repeating form instances.

rsh52 commented 2 weeks ago

I suppose this is up for debate, in our pkgdown documentation we specifically say:

However, as of REDCapTidieR v1.1.0 it is now possible to override this behavior by setting allow_mixed_structure in read_redcap() to TRUE. When enabled, nonrepeating variants of mixed structure instruments will be treated as repeating instruments with a single repeating instance.

So we could just leave it at "1" and refer people to the documentation.

ezraporter commented 2 weeks ago

Ah wait I think my solution was too hasty. What we really want is to populate redcap_repeat_instrument and redcap_repeat_instance when they're NA but not overwrite data that exists in them. In the new case you raised I think we just want to be updating redcap_repeat_instrument.

However, this all does raise some questions about redcap_form_instance. In the nr_rs_rt example, this really is a "form instance" when redcap_event == "repeating_separate" but when redcap_event == "repeating_together" it's really an "event instance" since the whole event is repeating.

rsh52 commented 2 weeks ago

Ah wait I think my solution was too hasty. What we really want is to populate redcap_repeat_instrument and redcap_repeat_instance when they're NA but not overwrite data that exists in them. In the new case you raised I think we just want to be updating redcap_repeat_instrument.

However, this all does raise some questions about redcap_form_instance. In the nr_rs_rt example, this really is a "form instance" when redcap_event == "repeating_separate" but when redcap_event == "repeating_together" it's really an "event instance" since the whole event is repeating.

Ok I think I need to mull this more tomorrow, I think we need to rework how we use redcap_event_instance/redcap_form_instance in this last scenario since it doesn't come out with both of them and it probably should and that can be how we identify the differences.

rsh52 commented 2 weeks ago

I think this should resolve it after some testing. Converting the simpler logic to case_when()s so it's more readable:

convert_mixed_instrument <- function(db_data_long, mixed_structure_ref) {
  for (i in seq_len(nrow(mixed_structure_ref))) {
    field <- mixed_structure_ref$field_name[i]
    form <- mixed_structure_ref$form_name[i]

    # Create an update mask column to identify which mixed structure rows need updates
    db_data_long <- db_data_long %>%
      mutate(
        update_mask = case_when(
          !is.na(field) & is.na(redcap_repeat_instance) ~ TRUE, # repeat separately instances
          !is.na(field) & !is.na(redcap_repeat_instance) & is.na(redcap_repeat_instrument) ~ TRUE, # repeat together instances
          .default = FALSE
        )
      )

    # Assign update data based on rules below
    db_data_long <- db_data_long %>%
      mutate(
        redcap_repeat_instance = case_when(
          # Add single instance repeat event instance vals when none exist
          update_mask & is.na(redcap_repeat_instance) ~ 1,
          # Keep repeat event instance vals when they already exist
          update_mask & !is.na(redcap_repeat_instance) ~ redcap_repeat_instance,
          .default = redcap_repeat_instance
        ),
        redcap_repeat_instrument = case_when(
          update_mask ~ form,
          .default = redcap_repeat_instrument
        )
      ) %>%
      select(-update_mask)
  }

  db_data_long
}

Original again for reference:

https://github.com/CHOP-CGTInformatics/REDCapTidieR/blob/7d388e945c0fd69c16872fd2654c5edeb4cfff3b/R/clean_redcap_long.R#L400-L416