duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
895 stars 83 forks source link

Ability to conditionally set/nullify `external_location`? #116

Closed aaronsteers closed 1 year ago

aaronsteers commented 1 year ago

I'm working on a PR where we may want to have the option to either import raw data or load from seed files via the external_location annotation.

Is it possible to return a value in external_location that nullifies the expression?

Something like the psuedocode:

external_location: "{{ if {some value set} then {a real external location} else null }}"

Where in this case null would be intepreted as no instruction.

The alternative is to ask users to comment and uncomment the value, e.g.:

    # Uncomment these lines if you want to use the pregenerated seed files:
    # meta:
    #   external_location: "read_csv_auto('./jaffle-data/{name}.csv', header=1)"

https://github.com/MeltanoLabs/jaffle-shop-template/pull/1/files#diff-6c181d0ccc3c5e37e2feaacdd8872e09073b138835156160dcb5ee7c159f8c63L3-R9

Can you say if null today would cause an error, or just a no-op relative to the external_location operator?

jwills commented 1 year ago

Yeah, my understanding (which I would be grateful if you would confirm for me) is that you can treat everything in one of dbt's .yml files as if it was jinja-fied as well, including meta tags on sources. So in your example, something like:

meta:
  external_location: "{{ \"read_csv_auto('./jaffle-data/{name}.csv', header=1)\" if some_value is not none }}"

...should return an empty string as the value of the external_location property if some_value is None. If the value of the external_location is an empty string, then this condition will evaluate to false, and we would fall back to using the standard dbt strategy for rendering the source table as a relation in the model.

(Note the fact that any field in a sources.yml file may be treated as a jinja template is exactly why we use f-string style formatting for the name and identifier in external sources.)

aaronsteers commented 1 year ago

Great, thanks @jwills !

I'll close for now and reopen if I run into any issue.

Also, will post back here when I see confirmation either way. 👍

aaronsteers commented 1 year ago

@jwills - Circling back.

These both work:

external_location: null  # Works!
external_location: ''  # Works!

However, when calculating with jinja, I could't get null working.

I ultimately went with returning an empty string.

sources:
  - name: ecom
    schema: "{{ env_var('JAFFLE_RAW_SCHEMA', 'jaffle_raw') }}"
    description: E-commerce data
    meta:
      # If `$JAFFLE_RAW_SCHEMA` is specified, use the provided raw data. Otherwise, use the csv seed data from the repo.
      external_location: >-
        {{ '' if env_var('JAFFLE_RAW_SCHEMA', '') else 'read_csv_auto("./jaffle-data/{name}.csv", header=1)' }}

This seems to work perfectly. 🙌 Thanks again!

(Nitty gritty details: I used the >- yaml operator which (1) let's me put the string on the next line for readability, (2) reduces string escaping I need to do, and (3) doesn't add any preceeding or post-fixed newlines.)

jwills commented 1 year ago

@aaronsteers that is good to here, I'll add a section about this situation (which I expect to be more common in the future) when I update the docs this week.

Wanted to clarify one thing tho: does this not work?

sources:
  - name: ecom
    schema: "{{ env_var('JAFFLE_RAW_SCHEMA', 'jaffle_raw') }}"
    description: E-commerce data
    meta:
      # If `$JAFFLE_RAW_SCHEMA` is specified, use the provided raw data. Otherwise, use the csv seed data from the repo.
      external_location: >-
        {{ 'read_csv_auto("./jaffle-data/{name}.csv", header=1)' if not env_var('JAFFLE_RAW_SCHEMA', '') }}
jwills commented 1 year ago

(totally understand the way you structured the check to maximize clarity, I just want to be sure if my understanding re the {{ value if cond }} construct is correct)

aaronsteers commented 1 year ago

Wanted to clarify one thing tho: does this not work?

sources:
  - name: ecom
    schema: "{{ env_var('JAFFLE_RAW_SCHEMA', 'jaffle_raw') }}"
    description: E-commerce data
    meta:
      # If `$JAFFLE_RAW_SCHEMA` is specified, use the provided raw data. Otherwise, use the csv seed data from the repo.
      external_location: >-
        {{ 'read_csv_auto("./jaffle-data/{name}.csv", header=1)' if not env_var('JAFFLE_RAW_SCHEMA', '')

Unfortunately not: TypeError: can not serialize 'Undefined' object

Any attempt to return None or null seems to give me this same error.

Details ```console Extension executing `dbt run`... 18:09:42 Running with dbt=1.4.5 18:09:42 Unable to do partial parsing because saved manifest not found. Starting full parse. 18:09:44 Encountered an error: can not serialize 'Undefined' object 18:09:44 Traceback (most recent call last): File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/main.py", line 136, in main results, succeeded = handle_and_check(args) File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/main.py", line 206, in handle_and_check task, res = run_from_args(parsed) File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/main.py", line 253, in run_from_args results = task.run() File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/task/runnable.py", line 454, in run self._runtime_initialize() File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/task/runnable.py", line 165, in _runtime_initialize super()._runtime_initialize() File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/task/runnable.py", line 94, in _runtime_initialize self.load_manifest() File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/task/runnable.py", line 81, in load_manifest self.manifest = ManifestLoader.get_full_manifest(self.config) File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/parser/manifest.py", line 206, in get_full_manifest manifest = loader.load() File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/parser/manifest.py", line 401, in load self.write_manifest_for_partial_parse() File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/dbt/parser/manifest.py", line 550, in write_manifest_for_partial_parse manifest_msgpack = self.manifest.to_msgpack() File "", line 56, in to_msgpack File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/mashumaro/mixins/msgpack.py", line 28, in default_encoder return msgpack.packb(data, use_bin_type=True) File "/Users/aj/Source/jaffle-shop-template/.meltano/utilities/dbt-duckdb/venv/lib/python3.10/site-packages/msgpack/__init__.py", line 38, in packb return Packer(**kwargs).pack(o) File "msgpack/_packer.pyx", line 294, in msgpack._cmsgpack.Packer.pack File "msgpack/_packer.pyx", line 300, in msgpack._cmsgpack.Packer.pack File "msgpack/_packer.pyx", line 297, in msgpack._cmsgpack.Packer.pack File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack [Previous line repeated 1 more time] File "msgpack/_packer.pyx", line 291, in msgpack._cmsgpack.Packer._pack TypeError: can not serialize 'Undefined' object ```
aaronsteers commented 1 year ago

This appears to be an issue with how dbt handles nulls when returned from jinja.

aaronsteers commented 1 year ago

You can repro with:

sources:
  - name: ecom
    schema: "{{ env_var('JAFFLE_RAW_SCHEMA', 'jaffle_raw') }}"
    meta:
      external_location: "{{ 'not-used' if false }}"
jwills commented 1 year ago

Ah, fascinating-- thank you sir!