databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
216 stars 118 forks source link

test flow of data using time travel #256

Open VDFaller opened 1 year ago

VDFaller commented 1 year ago

No idea if other people would use this but I've needed this for a while.

Problem

Data is only allowed to change in one direction following some flow. Like Step 1 -> Step 2 -> Step 3 and we want to fail if data changes from Step 3 to Step 2. Delta's time travel lets us actually test for this.

Proposal

This might not be the best method (or the right repo) but here's what I've come up with

{% test workflow_diode(model, column_name, step_order, id_column) %}

{% set sql_statement %}
    select version from (describe history {{model}}) limit 1
{% endset %}

{% set latest_version = dbt_utils.get_single_value(sql_statement) %}
SELECT * 
FROM (
  SELECT cur_version.{{id_column}}, 
    CASE 
      {%- for s in step_order %}
      WHEN cur_version.{{column_name}} = '{{s}}' THEN {{loop.index}} 
      {% endfor -%}
    END AS cur_flow, 
    CASE 
      {%- for s in step_order %}
      WHEN prev_version.{{column_name}} = '{{s}}' THEN {{loop.index}}
      {% endfor -%}    
    END AS prev_flow
  FROM {{model}}@v{{latest_version}} AS cur_version
  INNER JOIN {{model}}@v{{((latest_version|int)-1)}} AS prev_version
   ON cur_version.{{id_column}} = prev_version.{{id_column}}
)
WHERE cur_flow < prev_flow
{% endtest %}

Where my schema looks like

sources:
  - name: test_flow
    tables: 
      - name: test_flow
        columns:
          - name: step
            tests: 
              - workflow_diode:
                  step_order: ['Step 1', 'Step 2', 'Step 3', 'Step 4']
                  id_column: id
github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

henrikamnas commented 10 months ago

Nice!

benc-db commented 8 months ago

I'm unclear on exactly what you're trying to assert in the test here?

VDFaller commented 8 months ago

@benc-db That something never goes from Step3 to step2. If I have a workflow with a specific order of tasks I just don't want things to go in reverse. My actual use case is to test product lifecycles as we're supposed to only go one way.