Closed karafecho closed 10 months ago
This dataset PCD_UNC_patient_2010_v6_binned_deidentified contains data for "Race":
Excerpt:
TotalEDInpatientVisits | Sex2 | Sex | Race | Ethnicity | MajorRoadwayHighwayExposure
-- | -- | -- | -- | -- | --
0 | Female | Female | African American | Not Hispanic | 5
0 | Female | Female | African American | Not Hispanic | 6
0 | Female | Female | Caucasian | Not Hispanic | 6
0 | Female | Female | Caucasian | Not Hispanic | 6
0 | Male | Male | Caucasian | Not Hispanic | 6
TotalEDInpatientVisits | Sex3 | Sex | Race | Ethnicity | MajorRoadwayHighwayExposure
0 | Female | Female | African American | Not Hispanic | 6.4
0 | Female | Female | African American | Not Hispanic | 6.6
0 | Female | Female | Caucasian | Not Hispanic | 6.8
0 | Female | Female | Caucasian | Not Hispanic | 7
0 | Male | Male | Caucasian | Not Hispanic | 7.2
0 | Female | Female | Caucasian | Not Hispanic | 1
0 | Male | Male | Caucasian | Not Hispanic | 6
0 | Male | Male | Caucasian | Not Hispanic | 6
0 | Female | Female | African American | Hispanic | 2
0 | Female | Female | Caucasian | Not Hispanic | 6
0 | Female | Female | Caucasian | Not Hispanic | 6
0 | Male | Male | American/Alaskan Native | Not Hispanic | 6
@karafecho I looked into this issue and found the root cause for this issue is the discrepancy between the patient data and the feature definition in the feature yaml file. Specifically, the patient data has the feature variable named as "Race" as you indicated above, but the corresponding feature is defined as "Race_UNC" and "RACE" in the feature yaml file which don't match with the feature name in the patient file, hence FHIR PIT created None for all patients in Race_UNC and RACE feature columns. We will need to fix the pcd feature yaml file and rerun FHIR PIT in order to fix this issue.
Kara to update YAML file after Hong writes a script to create a diff file showing discrepancies in variables between the patient dataset and the all_features YAML file.
Complete, closing issue ...
This issue is to note that the PCD endpoint is hitting an older dataset, one that was missing data on race. For example:
curl -X 'POST' \ 'https://icees-pcd.renci.org/patient/cohort' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{}'
"return value": { "cohort_id": "COHORT:1", "size": 7940 }
curl -X 'GET' \ 'https://icees-pcd.renci.org/patient/cohort/COHORT%3A1/features' \ -H 'accept: text/tabular'
Excerpt: