Gilead-BioStats / gsm

Good Statistical Monitoring R Package
https://gilead-biostats.github.io/gsm/
Apache License 2.0
39 stars 10 forks source link

Feature: Add shared participant ID to all mapped data domains that contain participant ID. #1761

Closed samussiah closed 2 months ago

samussiah commented 3 months ago

Feature Details

All participant-level data should be able to be queried by the same participant ID value. EDC data contains a different set of participant IDs than Raw+ and protocol deviation data so dfSUBJ serves as a go-between because it contains both the Raw+ and EDC sets of participant IDs.

Example Code

mapped_edc_domain <- raw_edc_domain %>%
    inner_join(
        dfSUBJ %>%
            filter(
                enrollyn == 'Y'
            ) %>%
            select(subjid, subject_nsv),
        c('subjectname' = 'subject_nsv')
    )

Possible Implementation

In the data mapping workflow these steps are needed:

  1. Subset Mapped_SUBJ on subjid and subject_nsv.
  2. Join table from step 1 with each EDC domain on subjectname = subject_nsv.
  3. Update each mapped EDC table to include subjid.

Additional Comments

jwildfire commented 2 months ago

Agree we do need to sort this out.

@samussiah do you want to create a PR showing how this should work via updates in mapping? Or should we add more discussion here so someone else can implement?