National-COVID-Cohort-Collaborative / Phenotype_Data_Acquisition

The repository for code and documentation produced by the N3C Phenotype and Data Acquisition workstream
60 stars 35 forks source link

BigQuery OMOP: Possible issue with phenotype cohort ==> NULL pt_age #209

Closed mgkahn closed 3 years ago

mgkahn commented 3 years ago

This issue came up in the CTSA Common Metrics scripts which is why I looked here today.

Unlike MS SQL, Postgres and Oracle, BigQUery expects the first argument in DATE_DIFF to be the later date and the second argument to be the earlier date else it returns negative integers. SQL Render knows about this difference and handles the re-write correctly.

In the phenotype query, there are 20 instances of DATE_DIFF -- I am concerned that 19 of them are wrong. For the pt_age categories, all of the dates in these DATE_DIFFs are in the wrong sequence so that these calculations are returning negative numbers. This is resulting in all values for pt_age = NULL in subsequent tables.

Can you check any recent DQ report that looks at the pt_age column in the COHORT results to verify this is happening. If you can verify and you are using SQL Render, I do not understand why these are coming out wrong. image

image

AdamLeeIT commented 3 years ago

Michael, Great catch. I have updated the code and wrapped the date_diff expression in ABS() [absolute value] so it will always return a positive integer. While we could swap the dates, it seems the ABS() will make it more error proof. Please download the new version and let us try again. Also, could you please evaluate the impact of this change on your data? I would suspect that the previous iteration (i.e. incorrect version) nearly all your pt_Age variables are null unless they are born in the same year.

mgkahn commented 3 years ago

@AdamLeeIT You are correct about NULL pt_age. Only tiny, tiny fraction in 0-4 age group with a non-NULL value. I will stop today's run, download revised code, and restart the submission pipeline.

Still doesn't explain why SQL Render didn't correctly rewrite this code.

mgkahn commented 3 years ago

@AdamLeeIT : Sending you PR with two commits so you can accept the first and think about the second (which Colorado needs for the code to work):

@AdamLeeIT : OK, just submitted PR. Scripts are running fine with no NULL pt_age values.

AdamLeeIT commented 3 years ago

I corrected the abs() in the age category, pushed it then saw your PR. I closed your PR but will review the replacing d.birth_datetime with the group.

mgkahn commented 3 years ago

Just be aware Colorado does not use optional birth-date time but does include mandatory YOB, MOB, DOB so current script will continue to fail here.

Sent from iPhone


From: AdamLeeIT @.> Sent: Thursday, July 29, 2021 6:39:56 PM To: National-COVID-Cohort-Collaborative/Phenotype_Data_Acquisition @.> Cc: Kahn, Michael @.>; Author @.> Subject: Re: [National-COVID-Cohort-Collaborative/Phenotype_Data_Acquisition] BigQuery OMOP: Possible issue with phenotype cohort ==> NULL pt_age (#209)

[External Email - Use Caution]

Closed #209https://github.com/National-COVID-Cohort-Collaborative/Phenotype_Data_Acquisition/issues/209.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/National-COVID-Cohort-Collaborative/Phenotype_Data_Acquisition/issues/209#event-5088898036, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA557TT525FWLSB3TFU2W7TT2HYFZANCNFSM5BGQ7DVA.