episphere / conceptGithubActions

1 stars 7 forks source link

Misalignment of Data Dictionary Columns in Aggregate JSON #11

Open FrogGirl1123 opened 1 year ago

FrogGirl1123 commented 1 year ago

See descriptions below. The issue with harmonized variables might be fixed with adding the additional dictionary fields to the JSON that show the association between harmonized responses and the question.

[10/11 3:59 PM] Wu, Jing (NIH/NCI) [C]

Hi, Nicole and Daniel, please see this part of the dictionary data which the variables name is messed up with variable labels when the variable name are missing:

file Variable Label Variable Name

1 101144925
NA

In total, how many months or years have you used hormonal IUD? 2 101178950
NA

password 3 102445389
NA

1 to 9 times 4 103209024
NA

Home 5 103397024
NA

How old were you when a doctor or other health professional first told you that you have or had Trichomoniasis? 6 103409401
NA

4 days per week 7 103565678
NA

Mild hair loss on the sides of the forehead, but not as far back as the ears, and mild loss from the center of the forehead. Also, hair thinning on the top (crown) of the head. 8 103566006
NA

What kind of cigar, cigarillo, or little filtered cigar [do/did] you usually use? Select all that apply.

like 1


[10/11 4:01 PM] Wu, Jing (NIH/NCI) [C]

In the current dictionary, the variable names are not appliable for 1047 out of 4172 CIDs, which bring some issues when linked with the QC check code.

[9:12 AM] Wu, Jing (NIH/NCI) [C]

Also, if the dictionary in json is ordered by CID, another issue is the duplicate cases by CIDs with different variable names.

40 104430631 No permanent teeth lost SrvMw_NOPERMTTHLOST_v1r0
41 104430631 No permanent teeth lost SrvMw_DENTURES_v1r0
42 104666483 Other: Please describe SrvLAW_HomeWtr3_9_Oth_v1r0
43 104676242 Types of alcoholic beverages - Wine SrvSAS_Wine17_v1r0
44 104676242 Types of alcoholic beverages - Wine SrvSAS_Wine1824_v1r0
45 104676242 Types of alcoholic beverages - Wine SrvSAS_Wine2529_v1r0
46 104676242 Types of alcoholic beverages - Wine SrvSAS_Wine3039_v1r0
47 104676242 Types of alcoholic beverages - Wine SrvSAS_Wine4049_v1r0
48 104676242 Types of alcoholic beverages - Wine SrvSAS_Wine5059_v1r0
49 104676242


[9:13 AM] Wu, Jing (NIH/NCI) [C]

In the duplicate CIDs, I wonder these CIDs might be the first level of CID (conceptId.1, or conceptId.2)

Gbarra9 commented 1 year ago

Notes: This will require changes to how each concept ID is added to the JSONS folder on each run by the runCSVConversions.js script.

Each conceptId#.json file (Ex. 100181644.json) is read from the jsons folder for the following scripts masterParsing.js and aggregateJSONS.js. The individual conceptId#.json files are either added or filtered out of the Transformation.json file or aggregate.json file. Based on the script, the data already inside the conceptId#.json is manipulated when creating the Transformation.json file or aggregate.json file.

masterParsing.js --> creates Transformation.json file given to IMS aggregateJSONS.js --> creates aggregate.json file

Changing the structure inside the conceptId#.json files may cause errors/ compatibility issues with the current masterParsing.js and aggregateJSONS.js file scripts. This downstream of data manipulation may require fixes for masterParsing.js and aggregateJSONS.js.

Related Issues: Issue #8 Issue #9 Issue #10