dbt-labs / dbt-codegen

Macros that generate dbt code
https://hub.getdbt.com/dbt-labs/codegen/latest/
Apache License 2.0
459 stars 99 forks source link

dbt-codegen

Macros that generate dbt code, and log it to the command line.

Contents

Installation instructions

New to dbt packages? Read more about them here.

  1. Include this package in your packages.yml file — check here for the latest version number:
packages:
  - package: dbt-labs/codegen
    version: X.X.X ## update to latest version here
  1. Run dbt deps to install the package.

Macros

generate_source (source)

This macro generates lightweight YAML for a Source, which you can then paste into a schema file.

Arguments

Outputting to a file

If you use the dbt run-operation approach it is possible to output directly to a file by piping the output to a new file and using the --quiet CLI flag:

dbt --quiet run-operation generate_source --args '{"table_names": ["orders"]}' > models/staging/jaffle_shop/_sources.yml

Usage:

  1. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code
{{ codegen.generate_source('raw_jaffle_shop') }}

or for multiple arguments

{{ codegen.generate_source(schema_name= 'jaffle_shop', database_name= 'raw') }}

Alternatively, call the macro as an operation:

$ dbt run-operation generate_source --args 'schema_name: raw_jaffle_shop'

or

# for multiple arguments, use the dict syntax
$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "database_name": "raw", "table_names":["table_1", "table_2"]}'

or if you want to include column names and data types:

$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "generate_columns": true}'

or if you want to include column names without data types (the behavior dbt-codegen <= v0.9.0):

$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "generate_columns": true, "include_data_types": false}'
  1. The YAML for the source will be logged to the command line
version: 2

sources:
  - name: raw_jaffle_shop
    database: raw
    schema: raw_jaffle_shop
    tables:
      - name: customers
        description: ""
      - name: orders
        description: ""
      - name: payments
        description: ""
  1. Paste the output in to a schema .yml file, and refactor as required.

generate_base_model (source)

This macro generates the SQL for a base model, which you can then paste into a model.

Arguments:

Usage:

  1. Create a source for the table you wish to create a base model on top of.
  2. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code
{{ codegen.generate_base_model(
    source_name='raw_jaffle_shop',
    table_name='customers',
    materialized='table'
) }}

Alternatively, call the macro as an operation:

$ dbt run-operation generate_base_model --args '{"source_name": "raw_jaffle_shop", "table_name": "customers"}'
  1. The SQL for a base model will be logged to the command line
with source as (

    select * from {{ source('raw_jaffle_shop', 'customers') }}

),

renamed as (

    select
        id,
        first_name,
        last_name,
        email,
        _elt_updated_at

    from source

)

select * from renamed
  1. Paste the output in to a model, and refactor as required.

create_base_models (source)

This macro generates a series of terminal commands (appended with the && to allow for subsequent execution) that execute the base_model_creation bash script. This bash script will write the output of the generate_base_model macro into a new model file in your local dbt project.

Note: This macro is not compatible with the dbt Cloud IDE.

Arguments:

Usage:

  1. Create a source for the table you wish to create a base model on top of.
  2. Copy the macro into a statement tab into your local IDE, and run your code
dbt run-operation codegen.create_base_models --args '{source_name: my-source, tables: ["this-table","that-table"]}'

base_model_creation (source)

This bash script when executed from your local IDE will create model files in your dbt project instance that contain the outputs of the generate_base_model macro.

Note: This macro is not compatible with the dbt Cloud IDE.

Arguments:

Usage:

  1. Create a source for the table you wish to create a base model on top of.
  2. Copy the macro into a statement tab into your local IDE, and run your code
source dbt_packages/codegen/bash_scripts/base_model_creation.sh "source_name" ["this-table","that-table"]

generate_model_yaml (source)

This macro generates the YAML for a list of model(s), which you can then paste into a schema.yml file.

Arguments:

Usage:

  1. Create a model.
  2. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code
{{ codegen.generate_model_yaml(
    model_names=['customers']
) }}

You can use the helper function codegen.get_models and specify a directory and/or prefix to get a list of all matching models, to be passed into model_names list.

{% set models_to_generate = codegen.get_models(directory='marts', prefix='fct_') %}
{{ codegen.generate_model_yaml(
    model_names = models_to_generate
) }}

Alternatively, call the macro as an operation:

$ dbt run-operation generate_model_yaml --args '{"model_names": ["customers"]}'
  1. The YAML for a base model(s) will be logged to the command line
version: 2

models:
  - name: customers
    description: ""
    columns:
      - name: customer_id
        data_type: integer
        description: ""
      - name: customer_name
        data_type: text
        description: ""
  1. Paste the output in to a schema.yml file, and refactor as required.

generate_model_import_ctes (source)

This macro generates the SQL for a given model with all references pulled up into import CTEs, which you can then paste back into the model.

Arguments:

Usage:

  1. Create a model with your original SQL query
  2. Copy the macro into a statement tab in the dbt Cloud IDE, or into an analysis file, and compile your code
{{ codegen.generate_model_import_ctes(
    model_name = 'my_dbt_model'
) }}

Alternatively, call the macro as an operation:

$ dbt run-operation generate_model_import_ctes --args '{"model_name": "my_dbt_model"}'
  1. The new SQL - with all references pulled up into import CTEs - will be logged to the command line
with customers as (

    select * from {{ ref('stg_customers') }}

),

orders as (

    select * from {{ ref('stg_orders') }}

),

payments as (

    select * from {{ ref('stg_payments') }}

),

customer_orders as (

    select
        customer_id,
        min(order_date) as first_order,
        max(order_date) as most_recent_order,
        count(order_id) as number_of_orders
    from orders
    group by customer_id

),

customer_payments as (

    select
        orders.customer_id,
        sum(amount) as total_amount
    from payments
    left join orders on
         payments.order_id = orders.order_id
    group by orders.customer_id

),

final as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order,
        customer_orders.most_recent_order,
        customer_orders.number_of_orders,
        customer_payments.total_amount as customer_lifetime_value
    from customers
    left join customer_orders
        on customers.customer_id = customer_orders.customer_id
    left join customer_payments
        on  customers.customer_id = customer_payments.customer_id

)

select * from final
  1. Replace the contents of the model's current SQL file with the compiled or logged code

Contributing

To contirbute code to this package, please follow the steps outlined in the integration_tests directory's README file.