OHDSI / dbt-synthea

[Under development] A dbt ETL project to convert a Synthea synthetic data set into the OMOP CDM
https://ohdsi.github.io/dbt-synthea/
Apache License 2.0
16 stars 6 forks source link

Avoid use of `safe_cast`? #68

Closed lawrenceadams closed 1 month ago

lawrenceadams commented 1 month ago

dbt.safe_cast is used across 17 files - however, neither the postgres adapter nor duckdb support safe cast (although interestingly duckdb does support it), and so all this macro will do is use the usual ANSI style casting like so.

Given some databases we support in the future do support try_cast (for example, snowflake) we will end up with different behaviors when using different databases.

I might be missing something, but I think if there is a failed cast we would want it to properly raise an error and then debug this properly, rather than creating NULLs ?


Extends #40

lawrenceadams commented 1 month ago

Having thought about it - maybe we should just use vanilla ANSI stye casting instead of the cast/try_cast macro, although these may handle some specific type instances I am unaware of!

katy-sadowski commented 1 month ago

To be honest I don't remember why I used safe_cast instead of cast. I agree that we want an error if something is not the expected type. So I support the change in #69 !

I do believe that the reason I used the macro was that I ran into some issue where the SQL I'd written wasn't working on both duckdb and on Postgres. But I don't remember what the issue was. Normally I'm a bit more principled/organized, haha, but I'm pretty sure this was all done in my crunch time right before the abstract was due 😅

I like the idea of using the macros so we don't need to figure out what SQL will work where. However, I did already run into a case where what I wanted to do wasn't supported.

@vvcb suggested using SqlGlot to translate to vanilla ANSI, though I still wonder if we'd run into problems with that on some platforms?

lawrenceadams commented 1 month ago

Great!

Ahhh this makes sense!

Agree re: macros; do you remember what case wasn't supported though?

SqlGlot would potentially be a nice solution - but perhaps a bit too early a pre-optimization for now? (IMHO) Unless the cross-database macros don't give us what is needed?

katy-sadowski commented 1 month ago

Agree re: macros; do you remember what case wasn't supported though?

There were 2 things I ran into (for some stuff that didn't end up making it into the project) - they didn't have a LEFT macro (SUBSTR is an alternative, but there's no macro for it). Also, I ran into some trouble nesting macros within each other.

SqlGlot would potentially be a nice solution - but perhaps a bit too early a pre-optimization for now? (IMHO) Unless the cross-database macros don't give us what is needed?

Agree. With the above stuff I was trying to fix some date formats but found another way. So I think we carry on for now and worry about the broader picture of cross-DB support later.

lawrenceadams commented 1 month ago

There were 2 things I ran into (for some stuff that didn't end up making it into the project) - they didn't have a LEFT macro (SUBSTR is an alternative, but there's no macro for it). Also, I ran into some trouble nesting macros within each other.

Ahhh fair enough - I've had this issue in the past and just made my own macros with adapter specific dispatch (and then realized dbt had its own neater one 🙃 ). Good to be aware of, will keep my eyes open for them.

Agree. With the above stuff I was trying to fix some date formats but found another way. So I think we carry on for now and worry about the broader picture of cross-DB support later.

Sweet!