Medical-Event-Data-Standard / meds_etl

A collection of ETLs from common data formats to Medical Event Data Standard
Apache License 2.0
16 stars 3 forks source link

OMOP etl assumes "birth_datetime" column in person is present #30

Closed egillax closed 1 month ago

egillax commented 1 month ago

In this line the call to cast_to_datetime(batch, "birth_datetime") fails because the column "birth_datetime" doesn't exist in my data.

if table_name == "person":
    # Take the `birth_datetime` if its available, otherwise
    # construct it from `year_of_birth`, `month_of_birth`, `day_of_birth`
    time = pl.coalesce(
        cast_to_datetime(batch, "birth_datetime"),
        pl.datetime(
            pl.col("year_of_birth"),
            pl.coalesce(pl.col("month_of_birth"), pl.lit(1)),
            pl.coalesce(pl.col("day_of_birth"), pl.lit(1)),
            time_unit="us",
        ),
    )
egillax commented 1 month ago

Sorry I actually have the column just didn't export it.

Leaving this open in case you want to be robust to the case where the column doesn't exist. I'm not sure what the common practice is with non-required columns in the CDM.

EthanSteinberg commented 1 month ago

I think it's better to keep using the column as is because we want this code to crash when invalid OMOP data is provided. It's better to loudly fail than to silently lose data IMHO.

I'm going to close this issue, but feel free to reopen if you run into issues.