MaterializeInc / materialize

The data warehouse for operational workloads.
https://materialize.com
Other
5.68k stars 458 forks source link

dbt-materialize: support sinks in `--empty` flag #27827

Open morsapaes opened 4 weeks ago

morsapaes commented 4 weeks ago

Feature request

The underlying logic for dbt's --empty flag (aka "schema dry-runs") resolves references as subqueries that return no results. This does not work for sink models, since Materialize isn't able to parse statements like:

CREATE SINK "ci_dbt_empty_runs"."footprint"."example_sink"
  IN CLUSTER "ci_dbt_empty_runs"
  FROM (select * from "ci_dbt_empty_runs"."footprint"."example_materialied_view" where false limit 0)
  INTO KAFKA CONNECTION "dev"."connections"."example_connection" (TOPIC 'ci_dbt_empty_runs__footprint__example__v1')
  KEY ("OrderId", "ShipmentId", "ProjectLookupCode", "Milestone") NOT ENFORCED
  FORMAT JSON
  ENVELOPE UPSERT;

We should agree on what the default behavior should be, and potentially allow users to configure the intended behavior for them:

  1. Create the topic as would otherwise happen in a non-dry run. The sink will just reference an empty materialized view, so no data will be produced to the topic.
  2. Ignore sinks when the --empty flag is specified, and throw a warning.

Reported on Slack.

morsapaes commented 4 weeks ago

For completeness: when I looked into this, my conclusion was that special-casing this for sinks would require control over how ephemeral models are created in RuntimeRefResolver, which is dbt core code. There, only the limit configuration is set for the model, and we'd need type to also be set so we can then override render_limited in dbt-adapters.

Tentatively tagging @graciegoheen and @MichelleArk, who might have a better idea of our options here to work around the core behavior.