dbt-labs / spark-utils

Utility functions for dbt projects running on Spark
https://hub.getdbt.com/fishtown-analytics/spark_utils/latest/
Apache License 2.0
30 stars 15 forks source link

Adapt get_url_parameter to work with SparkSQL #11

Open foundinblank opened 3 years ago

foundinblank commented 3 years ago

Describe the bug

The get_url_parameter() macro breaks on Spark SQL (Databricks). I've come up with a replacement macro that'll work on SparkSQL and am wondering if I could contribute that fix.

Steps to reproduce

This was triggered when setting up Google Ads which uses get_url_parameter() macros: https://github.com/fivetran/dbt_google_ads_source/blob/master/models/stg_google_ads__final_url_performance.sql#L30-L34.

Expected results

I expected no errors to be thrown and UTM parameters to be parsed out per the model definition.

Actual results

Model fails to build with the error message:

Runtime Error in model stg_google_ads__final_url_performance (models/stg_google_ads__final_url_performance.sql)
  Database Error
    Error running query: java.util.regex.PatternSyntaxException: Illegal Unicode escape sequence near index 2
    \utm_content=
      ^

It passes when using this local macro as a replacement (stored in our /macros folder) which overwrites dbt_util's macro:

{# SparkSQL-compatible version of dbt_utils.get_url_parameter #}

{%- macro default__get_url_parameter(field, url_parameter) -%}

{%- set formatted_url_parameter = "'" + url_parameter + "='" -%}

nullif(split(split(parse_url({{ field }}, 'QUERY'), {{ formatted_url_parameter }})[1],'&')[0], '')

{%- endmacro -%}

System information

packages.yml

packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.6.4
  - package: fishtown-analytics/spark_utils
    version: 0.1.0
  - package: fishtown-analytics/dbt_external_tables
    version: 0.6.2
  - package: fivetran/google_ads
    version: 0.2.0
  - git: "https://github.com/netlify/segment.git"
    revision: master

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.19.1
   latest version: 0.19.1

Up to date!

Plugins:
  - spark: 0.19.1

Are you interested in contributing the fix?

I'm happy to contribute my macro which works on SparkSQL. If there's a way for dbt_utils to know which database or adapter it's running on, it could pass the appropriate macro?

clrcrl commented 3 years ago

Just transferred this to the spark-utils repo, since I think we'll want to contribute the fix here rather than on dbt utils!

jtcohen6 commented 3 years ago

@clrcrl Thanks for transferring!

@foundinblank I think this could be fixed by the improvements to spark__split_part in spark-utils v0.2.0 (just released last week). Could you try upgrading your version of spark-utils, and see if that works any better?