coderxio / sagerx

Open drug data pipelines curated by pharmacists.
https://coderx.io/sagerx
Other
41 stars 11 forks source link

Convert Postgres stored functions to dbt macros #271

Closed jrlegrand closed 1 month ago

jrlegrand commented 3 months ago

Problem Statement

If we ever want to migrate to a hosted database, it might be easier to use dbt macros instead of stored functions. We created these stored functions before we knew about dbt.

Stored functions location: https://github.com/coderxio/sagerx/blob/main/postgres/ndc_functions.sql

  1. ndc_format
  2. ndc_to_11
  3. ndc_convert

Criteria for Success

Run all dbt models that depend on these stored functions and they should work the same way as dbt macros.

Additional Information

The main downside of this is that users can't use these functions when writing SQL queries in Postgres. I haven't generally had the need for this myself though.

A "pro" of this approach is that we could write dbt tests to check if NDCs are in NDC11 format by using these macros.

jrlegrand commented 1 month ago

Oh snap - there's a way to use a macro to create a stored function in the database on dbt-run-start.

Source: https://medium.com/google-cloud/loading-and-transforming-data-into-bigquery-using-dbt-65307ad401cd

Begin quote from site above:

Notice that I am calling a user-defined function ch04.cleanup_numeric (target.schema is the output dataset, so ch04 here). This is defined in macros/cleanup_numeric.sql:

{% macro cleanup_numeric_macro() %}
CREATE OR REPLACE FUNCTION {{target.schema}}.cleanup_numeric(x STRING) AS
(
  IF ( x != 'NULL' AND x != 'PrivacySuppressed',
       CAST(x as FLOAT64),
       NULL )
);
{% endmacro %}

All the macros have to be executed before table creation starts. We do that by adding to dbt_project.yml the following line, to ensure that the above mentioned macro runs:

on-run-start:
    - "{{cleanup_numeric_macro()}}"