Gilead-BioStats / clindata

Synthetic Data for testing and development
https://gilead-biostats.github.io/clindata/
Apache License 2.0
9 stars 0 forks source link

raw_ie_a1, raw_ie_a2 data maps to very few rows using gsm::IE_Map_raw #5

Closed dsanders2gilead closed 2 years ago

dsanders2gilead commented 2 years ago

Using IE_Map_Raw, The data sets map to 7 and 4 rows respectively - both mapped results contain the same Count for all Subject/Site ids (46/48)

IE_Map_Raw: raw_id_a1 has 14,030 rows but Mapped only SubjectID SiteID Count

1 0413 X133X 46 2 0510 X164X 46 3 0606 X059X 46 4 0663 X192X 46 5 1149 X002X 46 6 1208 X210X 46 7 1277 X002X 46 raw_id_a2 has 10,608 rows but Mapped only 4 rows: IE_Map_Raw(dfIe = clindata::raw_ie_a2) # A tibble: 4 x 3 SubjectID SiteID Count 1 0142 X194X 48 2 0308 X159X 48 3 0776 X194X 48 4 1032 X033X 48
jwildfire commented 2 years ago

Issue is that SUBJID is mostly missing:

image

I think this is probably expected ...

Something is wrong in the Count column thought ...

image

jwildfire commented 2 years ago

Closing this for now. Can re-open if @gwu05 things the missing distribution in SUBJID Is off.

gwu05 commented 2 years ago

Yeah, looks like mainly empty SUBJIDs - one exercise to double check would be to map INVID+SCRNID to other datasets to see if any subjects are missing SUBJID when they do have a SUBJID. I did a quick check and didn't see many randomized subjects in IE datasets, so now I'm wondering where the subject data IE info

image

gwu05 commented 2 years ago

Confirmed that IE data only has data for subjects who did not meet IE criteria. For subjects who meet all IE criteria, data was not capture in the ie datasets. May need to address this on the function side