UNMCCC / CRIIS_Source_Extracts_for_OMOP

Cancer Clinical Data integration - Set of SQL extractions from disparate health system sources targeting the OMOP data model
MIT License
0 stars 0 forks source link

CNExT - Repeat Data Pulled in cnext_location Script #61

Closed lvarnedoe closed 2 years ago

lvarnedoe commented 2 years ago

The cnext_location script is pull a patient's address at the time of diagnosis and the patient's current address. These addresses can be the same. Per the OMOP CDM "Each address or Location is unique and is present only once in the table." Not sure we should pull the address twice when the 2 addresses are the same.

markmayer commented 2 years ago

Yes, and we can remove the duplicates. Just change the UNION ALL to just UNION. That should remove duplicates.

lvarnedoe commented 2 years ago

Doing a UNION instead of a UNION ALL reduces the number of rows returned, but there are still rows with repeated addresses. This is because one row might have the county and the other similar row does not.

markmayer commented 2 years ago

Although it looks like duplication, it's really not. One record is location at DX (CNEXT TUMOR) and the other is DX current (CNEXT PATEXTENDED). And although most of the data is the same, it's not as it pertains to different locations in NAACCR.

lvarnedoe commented 2 years ago

addressed by Union queries in cnext_location instead of Union All #61