databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
226 stars 119 forks source link

Passing multiple columns to partition_by for Python models causes a [PARSE_SYNTAX_ERROR] #752

Closed mcowart123 closed 3 months ago

mcowart123 commented 3 months ago

Describe the bug

I'm trying to use a python model with multiple partition_by columns and either it's not supported or I cannot figure out the syntax to pass the columns as. The documentation only shows examples of single columns.

Expected behavior

Multiple partition columns should be allowed.

Steps to reproduce

This works (single unquoted column name passed to partition_by):

def model(dbt, session):

    # setting configuration
    dbt.config(
        materialized = "incremental",
        unique_key = "['_md_source_group', 'period_key', 'market_key', 'product_key']",
        incremental_strategy = "merge",
        partition_by = "period_key")

    session.sql("USE CATALOG dev__etl__nielsen")
    df = session.sql(query(get_dest_newest_timestamp()))

    return df

Any variation of the following (no quotes, no brackets, etc) fail:

def model(dbt, session):

    # setting configuration
    dbt.config(
        materialized = "incremental",
        unique_key = "['_md_source_group', 'period_key', 'market_key', 'product_key']",
        incremental_strategy = "merge",
        partition_by = "['_md_source_group', 'period_key']")

    session.sql("USE CATALOG dev__etl__nielsen")
    df = session.sql(query(get_dest_newest_timestamp()))

    return df

Output of dbt run

File /databricks/spark/python/pyspark/errors/exceptions/captured.py:230, in capture_sql_exception.<locals>.deco(*a, **kw)
      226 converted = convert_exception(e.java_exception)
      227 if not isinstance(converted, UnknownException):
      228     # Hide where the exception came from that shows a non-Pythonic
      229     # JVM exception message.
  --> 230     raise converted from None
      231 else:
      232     raise

  ParseException: 
  [PARSE_SYNTAX_ERROR] Syntax error at or near '['. SQLSTATE: 42601 (line 1, pos 0)

  == SQL ==
  ['_md_source_group', 'period_key']
  ^^^

System information

The output of dbt --version:

Core:
  - installed: 1.8.4
  - latest:    1.8.4 - Up to date!

Plugins:
  - databricks: 1.8.4 - Up to date!
  - spark:      1.8.0 - Up to date!

The operating system you're using: MacOS (jobs running on Databricks)

The output of python --version: 3.10.13

mcowart123 commented 3 months ago

Never mind; I figured out the syntax.

 # setting configuration
    dbt.config(
        materialized = "incremental",
        unique_key = ['_md_source_group', 'period_key', 'market_key', 'product_key'],
        incremental_strategy = "merge",
        partition_by = ['_md_source_group', 'period_key'])

The documentation is confusing here though:

There's a limit to how complex you can get with the dbt.config() method. It accepts only literal values (strings, booleans, and numeric types) and dynamic configuration. Passing another function or a more complex data structure is not possible. The reason is that dbt statically analyzes the arguments to config() while parsing your model without executing your Python code. If you need to set a more complex configuration, we recommend you define it using the config property in a YAML file.