Open katy-sadowski opened 5 months ago
I've done a bit of playing to discover issues we may run into when not using duckdb/postgres. Here are some of my findings:
System | Finding |
---|---|
SQLite | Does not support multiple schemas (treats as separate physical databases) Differences in how views are required caused issues |
Snowflake | Unable to get very far as START is a protected keyword (used in a lot of synthea tables) and so issues thrown when using seed . → will probably need a custom data loader to get around this. More work needed. |
SQL Server | Found fiddly to get setup locally. Not yet successful - from memory due to keyword utilization but may be wrong. |
Probably all easily overcome with a few hours - but not as straightforward as I would have thought! DuckDb is almost too friendly in comparison! 😆
Getting dbt to work across database engines is not worth the hassle and you will quickly hit a tech debt ceiling.
Cross-database Synthea-OMOP transformation is already mostly solved here - https://github.com/lsc-sde/sqlmesh_synthea
@lawrenceadams , please take a look at https://sqlmesh.com/ and https://sqlglot.com/sqlglot.html
We are using SQLMesh for all new OMOP projects for this reason and many others. We build locally on DuckDb, and deploy to on-prem SQL Server and Azure Databricks because we are 'migrating' and are likely to be migrating for the next decade :-D. So, important for us to be able to manage a single codebase for multiple database engines and for several people to work on the same data warehouse/ lakehouse at the same time.
Yeah came to the same conclusion! Was an interesting bit of discovery!
Yep well aware of SQLMesh/Glot, is a nice tool!
And, if we want to continue to use dbt with some cross-platform support, then targeting the engines that are as close to being ANSI-compliant as possible and refactoring any existing SQL to ANSI-SQL will be a good start.
One way to do this would be to take all the compiled code that dbt generates currently and transpiling it to ANSI using SQLGlot and looking at the diffs.
One way to do this would be to take all the compiled code that dbt generates currently and transpiling it to ANSI using SQLGlot and looking at the diffs.
I like the idea, would be a cool part of this exploration, and a great learning opportunity!
@lawrenceadams regarding Snowflake and the START
keyword, would this help: https://docs.getdbt.com/reference/resource-configs/quote_columns
@lawrenceadams regarding Snowflake and the
START
keyword, would this help: https://docs.getdbt.com/reference/resource-configs/quote_columns
Excellent shout - this resolved that issue! I'll have more of a play now that's unblocked!
Cross-database macros are offered by dbt to compile sql depending on the target DBMS: https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros
The goal of this issue is to assess what SQL statements in the project will not translate across all DBMS, and replace them with a cross-DB macro (if possible).
OHDSI supported platforms: https://ohdsi.github.io/Hades/supportedPlatforms.html