fivetran / dbt_fivetran_log

Data models for Fivetran's internal log connector built using dbt.
https://fivetran.github.io/dbt_fivetran_log/
Apache License 2.0
30 stars 23 forks source link

Bug - fivetran_log__connector_status.sql failing with large sets of logs #56

Open fivetran-juliengoulley opened 2 years ago

fivetran-juliengoulley commented 2 years ago

Is there an existing issue for this?

Describe the issue

I have been running the fivetral_log dbt package on large sets of logs and I hit a memory limit (as per screenshot).

image

I have made some changes to my local version of fivetran_log__connector_status.sql model in order to get around the issue (see attached file) but could you investigate the issue?

fivetran_log__connector_status.txt

Relevant error log or model output

13:41:24  Database Error in model fivetran_log__connector_status (models/fivetran_log__connector_status.sql)
13:41:24    Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 128% of limit.
13:41:24    Top memory consumer(s):
13:41:24      sort operations used for analytic OVER() clauses: 99%
13:41:24      other/unattributed: 1%
13:41:24
13:41:24    compiled SQL at target/run/fivetran_log/models/fivetran_log__connector_status.sql

Expected behavior

Successful execution of the transformation

dbt Project configurations

Name your project! Project names should contain only lowercase characters

and underscores. A good package name should reflect your organization's

name or the intended use of these models

name: 'dbt_wildwest_fivetranlog' version: '1.0.0' config-version: 2

This setting configures which "profile" dbt uses for this project.

profile: 'dbt_wildwest_fivetranlog'

These configurations specify where dbt should look for different types of files.

The model-paths config, for example, states that models in this project can be

found in the "models/" directory. You probably won't need to change these!

model-paths: ["models"] analysis-paths: ["analyses"] test-paths: ["tests"] seed-paths: ["seeds"] macro-paths: ["macros"] snapshot-paths: ["snapshots"]

target-path: "target" # directory which will store compiled SQL files clean-targets: # directories to be removed by dbt clean

Configuring models

Full documentation: https://docs.getdbt.com/docs/configuring-models

In this example config, we tell dbt to build all models in the example/ directory

as tables. These settings can be overridden in the individual model files

using the {{ config(...) }} macro.

models: fivetran_log:

Config indicated by + and applies to all files under models/example/

# +post-hook: "delete FROM {{ var('connector') }} WHERE connector_name = 'julien_lambda_auth0';"

vars: fivetran_log_database: fivetran-wild-west fivetran_log_schema: julien_fivetran_log_live

on-run-end:

- SQL Statement

Package versions

packages:

What database are you using dbt with?

bigquery

dbt Version

Core:

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-jamie commented 1 year ago

hey @fivetran-juliengoulley -- so by removing transformations from the logs first you were able to bypass this issue? if so, i think it makes sense to adopt that logic in this model. otherwise, we may want to look into making this model incremental (though full refreshes would be an issue 🤔 )

fivetran-juliengoulley commented 1 year ago

Hey @fivetran-jamie, here is a link to the PR

fivetran-joemarkiewicz commented 1 year ago

HI @fivetran-juliengoulley thanks so much for opening the above PR.

We have reviewed the PR and believe your approach to indeed solve the issue at hand. However, it is a bit curious to us that no other customers have raised this issue to our team. As the changes you are proposing make some significant changes, and also implement incremental strategies, we are a bit hesitant to move forward with the changes as it will add complexity that could cause confusion within our most popular package.

Therefore, we have decided to keep this issue and accompanying PR open for the time being and not move forward with integrating the changes at the moment. With that being said, we highly encourage users of the package to comment in this issue if you are seeing a similar error. This will help us prioritize these changes if they are impacting customers across the board.

Thanks again and interested to see if others have a similar issue.