OHDSI / Themis

Repository for OMOP CDM conventions as defined by THEMIS. These can be reference lists of concepts, pieces of standardized code for data generation or quality certification, and debates.
Apache License 2.0
28 stars 9 forks source link

How to populate year_of_birth when it is missing from the source #92

Closed burrowse closed 7 months ago

burrowse commented 7 months ago

How to populate year_of_birth when it is missing from the source

CDM or THEMIS convention?

Themis

Is this a general convention?

No

Summary of issues

Summary of answer

Related links

waydes commented 7 months ago

Issue # and location

NA

Issue summary

The lack of year_of_birth creates a dilemma on how to process those records. If an age group categorization is available, the approximate year of birth can be derived. I could not find guidance on how to estimate year of birth from age group categorization. The age of a patient is so important to observational research that we have the convention to exclude patients without known age. The recommendation is to eliminate those records from a study.

Discussions in the forums indicate that setting year of birth to NULL precludes finding those records in SQL queries. Incorrect and inconsistent results occur when setting year of birth to 0. When year_of_birth is 0, Postgres calculates an age of 2021 years in but In SQL Server it would be 122 years old as year 0 is 1900-01-01.

Setting all unknown year of birth to specific year creates problems in performing network studies as the tools and alogortithms used in network studies do not include control structures (if/then or switch statements ) to identify unknown year of birth when set to an incorrect year of birth with the assumption that that year means "unknown year of birth". Modifying the code in tools to accomodate the idiosyncrasies of databases creates problems and requires additional work. This same issue occurs when year of birth is set to 0 or NULL.

The lack of year of birth raises an issue about year of birth know to be incorrect. Examples include year of birth after today''s year year of birth after the most recent year of visit or other fields with year.

Convention type

Table

CDM table

Person

CDM field

year_of_birth

Links to issue discussion

Provenance of data.

General

The ratified convention

For data sources with date of birth, the year should be extracted. For data sources where the year of birth is not available, the approximate year of birth could be derived based on age group categorization, if available. If no year of birth is available all the person’s data should be dropped from the CDM instance.

Date of ratification/published

4/9/2024

Downstream implications

No

Link to DQD check

Yes - isRequired.

Related conventions/further information

Other helpful information, if needed. i.e. related conventions, queries to evaluate source or CDM data, or any additional information

#Tags birthdate, birthyear, year_of_birth

clairblacketer commented 7 months ago

@waydes is this one ready to review?

waydes commented 7 months ago

I need to finish it. I will do that tonight and move it over to review.

Wayde Shipman, DVM, MS

[image: ]

On Tue, Apr 9, 2024 at 6:06 PM clairblacketer @.***> wrote:

@waydes https://github.com/waydes is this one ready to review?

— Reply to this email directly, view it on GitHub https://github.com/OHDSI/Themis/issues/92#issuecomment-2046178004, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSUF35ELAUC6Q3CUFADL4LY4RYA3AVCNFSM6AAAAABFR57GIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWGE3TQMBQGQ . You are receiving this because you were mentioned.Message ID: @.***>

clairblacketer commented 7 months ago

Opened PR #116