dbt-labs / spark-utils

Utility functions for dbt projects running on Spark
https://hub.getdbt.com/fishtown-analytics/spark_utils/latest/
Apache License 2.0
30 stars 15 forks source link

More possible macros to add #5

Open emilieschario opened 3 years ago

emilieschario commented 3 years ago

for cross db compatibility and making packages easier to use:

emilieschario commented 3 years ago
last_value(priority ignore nulls) over (partition by ticket_id order by date_day asc rows between unbounded preceding and current row) as priority

becomes

last_value(priority, true) over (partition by ticket_id order by date_day asc rows between unbounded preceding and current row) as priority

ignore nulls is what we're handling here. good for all/most window functions to do this too. 😄

emilieschario commented 3 years ago

The Zendesk package (Fivetran) has

{% macro json_extract(string, string_path) -%}

{{ adapter.dispatch('json_extract', packages = _get_utils_namespaces()) (string, string_path) }}

{%- endmacro %}

{% macro default__json_extract(string, string_path) %}

  json_extract_path_text({{string}}, {{ "'" ~ string_path ~ "'" }} )

{% endmacro %}

{% macro bigquery__json_extract(string, string_path) %}

  json_extract({{string}}, {{ "'$." ~ string_path ~ "'" }} )

{% endmacro %}

I think the Spark SQL version of this would be https://spark.apache.org/docs/2.3.0/api/sql/#get_json_object

jtcohen6 commented 3 years ago

I had a chance to catch up with @fivetran-kristin last week. Fivetran has done a cool thing and moved most of their SQL syntax macros into a dbt_fivetran_utils package.

Kristin is thinking about whether to add spark__ implementations of those macros into that package directly, or into this package, nested under macros/dbt_fivetran_utils/.

emilieschario commented 3 years ago

@fivetran-kristin do you have more thoughts you want to share here? Would love to coordinate efforts.

kristin-bagnall commented 3 years ago

@emilieschario and @jtcohen6 - I think the best solution here is to nest all spark implementations of our macros into this package.

I've outlined more details on Fivetran's approach to Spark compatibility here. @jtcohen6 I've shared with you. @emilieschario I don't have your email so wasn't able to share. Can you either request access or provide your email address?

Thanks!

emilieschario commented 3 years ago

Just requested access so that I don't have to post my email on the inter webs 😄

jtcohen6 commented 3 years ago

@fivetran-kristin I like it! Thanks for developing a solid plan here.

I added a comment in support of fivetran_utils_dispatch_list. It might be overkill for each package to allow itself to be shimmed in slightly different ways, but I prefer being more flexible/explicit now in the event it proves necessary. If, in 6-12 months time, it turns out to be entirely superfluous, we can just standardize all packages to take a single variable adapter_dispatch_list (with backward support for the old variable names), no harm done.

emilieschario commented 3 years ago