Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
511 stars 131 forks source link

[FEATURE] Support Athena Adapter #251

Open GabrielEisenbergOlympus opened 1 month ago

GabrielEisenbergOlympus commented 1 month ago

Is your feature request related to a problem? Please describe. I am using the dbt-athena adapter which was recently taken over by dbt-labs and was announced as being GA. AutomateDV complains that it does not have the three following macros:

  1. athena__cast_date
  2. athena__cast_datetime
  3. athena__get_escape_characters

Describe the solution you'd like Add these macros to the project. They seem relatively easy to add. As I am new to the project, I might be missing the nuances and complexities of adding a new adapter.

Describe alternatives you've considered None

Additional context The Athena escape character is a double quote.

The casting can be done along the lines of:

date(date_parse('2022-10-20', '%Y-%m-%d'))
date_parse('2022-10-20 05:00:01', '%Y-%m-%d %h:%i:%s')
from_iso8601_timestamp('2022-10-20T05:00:01.000Z')

I'm waiting on approval to contribute back but if someone can do this sooner than later, please feel free! 😄

GabrielEisenbergOlympus commented 1 month ago

As an attempt I did the following. Obviously, feel free to modify/correct the code.

At macros/internal/metadata_processing/get_escape_characters.sql, I added:

{%- macro athena__get_escape_characters() %}
    {%- do return (('"', '"')) -%}
{%- endmacro %}

At macros/supporting/casting/cast_date.sql, I added:

{%- macro athena__cast_date(column_str, as_string=false, alias=none) -%}
    {%- if not as_string -%}
        date_parse({{ column_str }}, '%Y-%m-%d')
    {%- else -%}
        date_parse('{{ column_str }}', '%Y-%m-%d')
    {%- endif -%}
    {%- if alias %} AS {{ alias }} {%- endif %}
{%- endmacro -%}

At macros/supporting/casting/cast_datetime.sql, I added:

{%- macro athena__cast_datetime(column_str, as_string=false, alias=none, date_type=none) -%}
    from_iso8601_timestamp({{ column_str }})
    {%- if alias %} AS {{ alias }} {%- endif %}
{%- endmacro -%}

At macros/supporting/data_types/type_binary.sql, I added:

{%- macro athena__type_binary(for_dbt_compare=false) -%}
    varbinary
{%- endmacro -%}

At "macros/supporting/hash_components/select_hash_alg.sql", I added:

{% macro athena__hash_alg_md5() -%}

    {% do return("md5(to_utf8([HASH_STRING_PLACEHOLDER]))") %}

{% endmacro %}

{% macro athena__hash_alg_sha1() -%}

    {% do return("sha1(to_utf8([HASH_STRING_PLACEHOLDER]))") %}

{% endmacro %}

{% macro athena__hash_alg_sha256() -%}

    {% do return("sha2(to_utf8([HASH_STRING_PLACEHOLDER]), 256)") %}

{% endmacro %}

At macros/supporting/hash_components/standard_column_wrapper.sql, I added:

{%- macro athena__standard_column_wrapper(hash_content_casing) -%}

    {%- if hash_content_casing == 'upper' -%}
        {%- set standardise -%}
            COALESCE(UPPER(TRIM(CAST([EXPRESSION] AS {{ automate_dv.type_string() }}))), '')
        {%- endset -%}
    {%- else -%}
        {%- set standardise -%}
            COALESCE(TRIM(CAST([EXPRESSION] AS {{ automate_dv.type_string() }})), '')
        {%- endset -%}
    {%- endif -%}

    {% do return(standardise) -%}

{%- endmacro -%}

At macros/staging/null_columns.sql, I added:

{%- macro athena__null_column_sql(col_name, default_value) -%}

    {{ col_name }} AS {{ col_name ~ "_ORIGINAL" }},
    COALESCE({{ col_name }}, '{{ default_value }}') AS {{ col_name }}

{%- endmacro -%}

At macros/supporting/hash_components/null_expression.sql, I added:

{%- macro athena__null_expression(standardise, column_str, null_placeholder_string) -%}

    {%- set column_expression -%}
        COALESCE({{ standardise | replace('[EXPRESSION]', column_str) }}, '{{ null_placeholder_string }}')
    {%- endset -%}

    {% do return(column_expression) %}

{%- endmacro -%}