fivetran / dbt_marketo

Fivetran's Marketo dbt package
https://fivetran.github.io/dbt_marketo/
Apache License 2.0
8 stars 7 forks source link

FEATURE - partition_by does not work for Databricks #21

Closed tfayyaz closed 1 year ago

tfayyaz commented 2 years ago

Is there an existing issue for this?

Describe the issue

Trying to run models that have partition_by in the config fails on Databricks as the format is different for Databricks.

Currently, the config is:

{{ 
    config(
        materialized='incremental',
        partition_by = {'field': 'date_day', 'data_type': 'date'},
        unique_key='lead_day_id'
        ) 
}}

And if you run the model using:

% dbt run --select marketo.intermediate.marketo__change_data_details 

You get the error below

Relevant error log or model output

15:31:16  Running with dbt=1.1.0
15:31:17  Found 62 models, 69 tests, 0 snapshots, 0 analyses, 563 macros, 1 operation, 0 seed files, 26 sources, 0 exposures, 0 metrics
15:31:17  
15:31:33  
15:31:33  Running 1 on-run-start hook
15:31:33  1 of 1 START hook: marketo.on-run-start.0 ...................................... [RUN]
15:31:33  1 of 1 OK hook: marketo.on-run-start.0 ......................................... [OK in 0.00s]
15:31:33  
15:31:33  Concurrency: 2 threads (target='dev')
15:31:33  
15:31:33  1 of 1 START incremental model tfayyaz_dbt_marketing_marketo.marketo__change_data_details  [RUN]
15:31:40  1 of 1 ERROR creating incremental model tfayyaz_dbt_marketing_marketo.marketo__change_data_details  [ERROR in 7.54s]
15:31:42  
15:31:42  Finished running 1 incremental model, 1 hook in 25.15s.
15:31:42  
15:31:42  Completed with 1 error and 0 warnings:
15:31:42  
15:31:42  Runtime Error in model marketo__change_data_details (models/intermediate/marketo__change_data_details.sql)
15:31:42    Query execution failed.
15:31:42    Error message: org.apache.spark.sql.AnalysisException: Couldn't find column field in:
15:31:42    root
15:31:42     |-- lead_id: long (nullable = true)
15:31:42     |-- date_day: date (nullable = true)
15:31:42     |-- lead_day_id: string (nullable = false)
15:31:42  
15:31:42  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Expected behavior

It should work for Databricks by default. Maybe by having an if statement in the config like below which fixes this?

{{
   config(
       materialized='incremental',
       partition_by = {{ ['date_day'] if target.type = 'databricks' else {'field': 'date_day', 'data_type': 'date'} }},
       unique_key='lead_day_id'
       )
}}

or by adding docs to tell Databrricks user to add the following to their dbt_project.yml file

models:
  marketo:
    intermediate:
      marketo__change_data_details:
        +partition_by: ['date_day']

dbt Project configurations

name: 'marketing_lakehouse' version: '1.0.0' config-version: 2

This setting configures which "profile" dbt uses for this project.

profile: 'marketing_lakehouse'

These configurations specify where dbt should look for different types of files.

The model-paths config, for example, states that models in this project can be

found in the "models/" directory. You probably won't need to change these!

model-paths: ["models"] analysis-paths: ["analyses"] test-paths: ["tests"] seed-paths: ["seeds"] macro-paths: ["macros"] snapshot-paths: ["snapshots"]

target-path: "target" # directory which will store compiled SQL files clean-targets: # directories to be removed by dbt clean

models: +file_format: delta

seeds: +file_format: delta

snapshots: +file_format: delta

dispatch:

vars: marketo_source: marketo_database: marketing_marketo marketo_schema: marketing_marketo salesforce_source: salesforce_database: marketing_sfdc_fivetran salesforce_schema: marketing_sfdc_fivetran

models:

marketo:

intermediate:

marketo__change_data_details:

+partition_by: ['date_day']

Package versions

packages:

What database are you using dbt with?

databricks

dbt Version

1.1.0

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 2 years ago

Hi @tfayyaz thanks for opening this issue!

It looks like the issue is tied to the fact that the package compatibility does not include Databricks yet. In addition to the partition_by needing to be adjusted, I imagine there are a few other components of this package that will need to be adjusted in order for the package to be compatible with Databricks.

I noticed you mentioned you would be open to creating a PR for this change. If so, I would recommend taking a look at some of our other packages and see how we allowed for Databricks compatibility. See how we did it in the below PRs for dbt_microsoft_ads:

In the meantime, I will leave this issue open to see if others would like to see Databricks compatibility supported in the package. If more people show interest, we can prioritize this feature request! Likewise, if you decide to open a PR, I would be happy to help in any way I can 😄

tfayyaz commented 2 years ago

@fivetran-joemarkiewicz thanks for getting back on this. I have identified a few other issues so will try and see if I can combine those all and then add a PR in a few weeks to add the changes and tests.

fivetran-joemarkiewicz commented 2 years ago

@tfayyaz that sounds great! Please let me know if you need any assistance when working on the PR!

fivetran-joemarkiewicz commented 1 year ago

Transitioning this issue from a bug to a feature.

Our team will tackle converting this dbt package for Databricks compatibility in our next sprint.

fivetran-joemarkiewicz commented 1 year ago

Closing this issue as the above PR offers databricks compatibility and the original issue should be addressed.