dbt-labs / dbt-docs

Auto-generated data documentation site for dbt projects
Apache License 2.0
141 stars 75 forks source link

Feature: Show external table uri when materilized in s3 buckets with dbt-duckdb #532

Open 01100100 opened 3 days ago

01100100 commented 3 days ago

Describe the feature

I would like dbt-docs to display the S3 URI for externally materialized tables in the "Relation" field, similar to how relations are shown for other adapters.

For example, given a model models/user.sql with the following profile and model configuration, the data will be written to https://fly.storage.tigris.dev/bucket-xxx/modelled/user.json. I would like this URI to be visible in the docs, ideally within the "relation" section, for quick reference.

Example Configuration

factory:
  target: dev
  outputs:
    dev:
      threads: 4
      type: duckdb
      extensions: ['httpfs']
      path: dbt.duckdb
      secrets:
        - type: s3
          region: "{{ env_var('AWS_REGION') }}"
          key_id: "{{ env_var('AWS_ACCESS_KEY_ID') }}"
          secret: "{{ env_var('AWS_SECRET_ACCESS_KEY') }}" 
          endpoint: "{{ env_var('AWS_ENDPOINT_URL_S3') | replace('https://', '') }}"
      external_root: s3://bucket-xxx/modelled
      default:
export AWS_ENDPOINT_URL_S3=fly.storage.tigris.dev
models:
  factory:
    +materialized: external
    user:
      +format: json

In this case, the model models/user.sql will write the external table to https://fly.storage.tigris.dev/bucket-xxx/modelled/user.json. I would like this path to be included in the docs.

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.

This feature is specific to the dbt-duckdb adapter and applies when writing to external files.

The external location path is set in this macro:

If the location argument is specified, it must be a filename (or S3 bucket/path), and dbt-duckdb will attempt to infer the format argument from the file extension of the location if the format argument is unspecified (this functionality was added in version 1.4.1.)

If the location argument is not specified, then the external file will be named after the model.sql (or model.py) file that defined it with an extension that matches the format argument (parquet, csv, or json). By default, the external files are created relative to the current working directory, but you can change the default directory (or S3 bucket/prefix) by specifying the external_root setting in your DuckDB profile.

Who will this benefit?

This feature will be valuable for:

Additionally, this could pave the way for a more interactive exploration of model data directly within the dbt docs by linking to the external data location. :thinking: CLOUD NATIVE DATA FORMATS + WASM INMEMORY DATABASE :zap:

Are you interested in contributing this feature?

Yes :man_beard:

01100100 commented 2 days ago

@jtcohen6 Tagging you here as maintainer. :man_health_worker:

I think I did something wrong because I got notified that this is failing: https://github.com/dbt-labs/dbt-docs/actions/runs/11401080110