duckdb / dbt-duckdb

dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
Apache License 2.0
924 stars 89 forks source link

Feature: DBT-DOCS Show external table uri when materilized in s3 buckets with dbt-duckdb #466

Open 01100100 opened 1 month ago

01100100 commented 1 month ago

I made a issue in the dbt-docs repo, maybe its better suited here

Describe the feature

I would like dbt-docs to display the S3 URI for externally materialized tables in the "Relation" field, similar to how relations are shown for other adapters.

For example, given a model models/user.sql with the following profile and model configuration, the data will be written to https://fly.storage.tigris.dev/bucket-xxx/modelled/user.json. I would like this URI to be visible in the docs, ideally within the "relation" section, for quick reference.

Example Configuration

factory:
  target: dev
  outputs:
    dev:
      threads: 4
      type: duckdb
      extensions: ['httpfs']
      path: dbt.duckdb
      secrets:
        - type: s3
          region: "{{ env_var('AWS_REGION') }}"
          key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}"
          secret: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}" 
          endpoint: "{{ env_var('AWS_ENDPOINT_URL_S3') | replace('https://', '') }}"
      external_root: s3://bucket-xxx/modelled
      default:
export AWS_ENDPOINT_URL_S3=fly.storage.tigris.dev
models:
  factory:
    +materialized: external
    user:
      +format: json

In this case, the model models/user.sql will write the external table to https://fly.storage.tigris.dev/bucket-xxx/modelled/user.json. I would like this path to be included in the docs.

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.

This feature is specific to the dbt-duckdb adapter and applies when writing to external files.

The external location path is set in this macro:

If the location argument is specified, it must be a filename (or S3 bucket/path), and dbt-duckdb will attempt to infer the format argument from the file extension of the location if the format argument is unspecified (this functionality was added in version 1.4.1.)

If the location argument is not specified, then the external file will be named after the model.sql (or model.py) file that defined it with an extension that matches the format argument (parquet, csv, or json). By default, the external files are created relative to the current working directory, but you can change the default directory (or S3 bucket/prefix) by specifying the external_root setting in your DuckDB profile.

Who will this benefit?

This feature will be valuable for:

Additionally, this could pave the way for a more interactive exploration of model data directly within the dbt docs by linking to the external data location. :thinking: CLOUD NATIVE DATA FORMATS + WASM INMEMORY DATABASE :zap:

Are you interested in contributing this feature?

Yes :man_beard:

jwills commented 1 month ago

Ah, makes sense. I'm not super-familiar with how the dbt docs get generated, but if there is something I can do to help enable this info to appear in the docs, I am happy to take a look at PRs etc. etc.

Thank you!