fivetran / dbt_amplitude

Fivetran's Amplitude dbt package
https://fivetran.github.io/dbt_amplitude/
Apache License 2.0
3 stars 3 forks source link

[Bug] Inaccurate Integration Tests and Snowflake Incremental Bug #8

Closed fivetran-joemarkiewicz closed 1 year ago

fivetran-joemarkiewicz commented 1 year ago

Is there an existing issue for this?

Describe the issue

Currently there are two inconsistencies within the integration tests that do not effectively test the package and one bug that has been uncovered:

1. Models materialized as tables

There currently exists a models config that materializes all the models as table.

https://github.com/fivetran/dbt_amplitude/blob/15d9099d82c6f83281248789274f27623e21ab6b/integration_tests/dbt_project.yml#L12-L14

This produces inaccurate integration test results as we are not testing the incremental strategies within the models. We will need to update the integration tests to remove this config.

2. Incremental models not tested in pipeline

Our current run_models.sh file (which runs tests against our seed data using BuildKite) is only running the package models on fresh data. Instead, we need to update this to run first on a fresh set, and then second on a the same set of data but using the incremental strategy. The change needed here will be to add a second dbt run --target "$db" and dbt test --target "$db" the following code snippet in order to effectively test these models.

https://github.com/fivetran/dbt_amplitude/blob/15d9099d82c6f83281248789274f27623e21ab6b/.buildkite/scripts/run_models.sh#L20-L21

3. The Snowflake bug that was uncovered

Customers using this package on a snowflake warehouse are experiencing failures when attempting to run the package incrementally due to a subquery containing a correlated aggregate function that can only appear in a having or a select clause.

This causes failures for incremental runs and needs to be adjusted within the package to account for the correlated subquery restriction for Snowflake users.

Relevant error log or model output

## The consequence of not having the proper testing resulted in the following error.

22:56:08  Finished running 2 view models, 3 table models, 4 incremental models in 0 hours 0 minutes and 8.04 seconds (8.04s).
22:56:08  
22:56:08  Completed with 1 error and 0 warnings:
22:56:08  
22:56:08  Database Error in model amplitude__sessions (models/amplitude__sessions.sql)
22:56:08    002036 (42601): SQL compilation error:
22:56:08    Subquery containing correlated aggregate function [MAX(TRUNCTIMESTAMPTODay(STG_AMPLITUDE__EVENT.EVENT_TIME))] can only appear in having or select clause

Expected behavior

The integration tests are effectively testing the model logic for all forms of the package. Especially --full-refresh and incremental.

dbt Project configurations

N/A

Package versions

v0.2.0

What database are you using dbt with?

snowflake

dbt Version

v1.3.1

Additional Context

This has limitation has uncovered a bug in the package for snowflake users when trying to run incrementally. You can see more details in the following dbt slack thread.

Are you willing to open a PR to help address this issue?

fivetran-reneeli commented 1 year ago

Issues addressed in latest release!