fivetran / dbt_fivetran_log

Data models for Fivetran's internal log connector built using dbt.
https://fivetran.github.io/dbt_fivetran_log/
Apache License 2.0
30 stars 25 forks source link

BUG - fivetran_log__audit_table should be partitioned #27

Closed CraigWilson-ZOE closed 2 years ago

CraigWilson-ZOE commented 2 years ago

Are you a current Fivetran customer? Craig Wilson, ZOE, Data Engineer

Describe the bug We are monitoring how long each model in our dbt pipeline takes to process and the fivetran_log__audit_table model is one of the longest running that we have. The average execution time is 490 seconds. Looking at the code I believe this model would benefit from being partitioned and processing only the latest day, rather than all data.

Steps to reproduce

  1. Run the dbt fivetran_log package.
  2. In the screenshot you can see the average run time per day for the package, and you can see this is steadily increasing slightly. Looking at this trend I would think it would increase indefinitely. Screenshot 2021-12-13 at 13 35 20

Expected behavior I would expect the time of execution to be more constant and not be as high as it currently is

Project variables configuration only copying configuration for the relevant section due to security.

    # Fivetran log package configuration
    fivetran_log:
      fivetran_log_database: xxxxxx    # hidden for security
      fivetran_log_schema: fivetran_log
      fivetran_log_using_transformations: false # this will disable all transformation + trigger_table logic
      fivetran_log_using_triggers: false # this will disable only trigger_table logic

Package Version

packages:
  # includes dbt_utils, thus no need to seperately import it
  - package: calogica/dbt_date
    version: [">=0.4.0", "<0.5.0"]
  - package: fivetran/mixpanel
    version: [">=0.4.0", "<0.5.0"]
  - package: calogica/dbt_expectations
    version: [">=0.4.0", "<0.5.0"]
  - package: dbt-labs/codegen
    version: 0.4.0
  - package: data-mie/dbt_profiler
    version: 0.1.4
  - package: fivetran/stripe_source
    version: 0.4.3
  - package: fivetran/stripe
    version: 0.5.0
  - package: fivetran/fivetran_log
    version: [">=0.4.0", "<0.5.0"]

Warehouse

Additional context N/A

Screenshots Attached higher up

Please indicate the level of urgency This isn't super urgent but it is taking more and more time, and is impacting cost as we are processing more and more rows each day.

Are you interested in contributing to this package?

fivetran-jamie commented 2 years ago

hey @CraigWilson-ZOE -- we've added some incremental + partitioning logic to the audit table model in the feature/audit-incrementality working branch. would you mind testing the branch out to see how the runtime is affected? the first run will probably be a full-refresh and won't make a difference, but hopefully we see a big difference with the incremental runs

# packages.yml
  - git: https://github.com/fivetran/dbt_fivetran_log.git
    revision: feature/audit-incrementality
CraigWilson-ZOE commented 2 years ago

Hi Jamie,

Sorry for the delay, just back from the holidays.

I will try this out and get back to you, thanks.

fivetran-jamie commented 2 years ago

no worries -- if you have a chance to test it out soon, we were aiming to release the fix before our sprint ends this week 🙂

CraigWilson-ZOE commented 2 years ago

Hey Jamie,

Just managed to try this, we had a few issues upgrading to v1.0.1.

The package ran OK and I can see a smaller number of records processed for the audit_table model.

Thanks