fivetran / dbt_fivetran_log

Data models for Fivetran's internal log connector built using dbt.
https://fivetran.github.io/dbt_fivetran_log/
Apache License 2.0
30 stars 24 forks source link

[Bug] duplicate rows in fivetran_log__schema_changelog #49

Closed CraigWilson-ZOE closed 2 years ago

CraigWilson-ZOE commented 2 years ago

Is there an existing issue for this?

Describe the issue

Running the latest version of the fivetran_log package I am getting the following test failure:

dbt_utils_unique_combination_of_columns_fivetran_log__schema_changelog_connector_id__destination_id__created_at

Relevant error log or model output

The result of the query shows many records showing the same schema change, or difference schema changes happening at the same time i.e. (JSON output from BigQuery):

{  "connector_id": "dashing_overbite",  "created_at": "2022-03-31T18:34:14.412Z",  "message_data": "{\"type\":\"ADD_COLUMN\",\"table\":\"people\",\"properties\":{\"columnName\":\"_2022_q_1_survey_adherence\",\"dataType\":\"STRING\"}}"}
{  "connector_id": "dashing_overbite",  "created_at": "2022-03-31T18:34:14.412Z",  "message_data": "{\"type\":\"ADD_COLUMN\",\"table\":\"people\",\"properties\":{\"columnName\":\"check_in_start\",\"dataType\":\"TIMESTAMP\"}}"}
{  "connector_id": "dashing_overbite",  "created_at": "2022-03-31T18:34:14.412Z",  "message_data": "{\"type\":\"ADD_COLUMN\",\"table\":\"people\",\"properties\":{\"columnName\":\"_2022_q_1_survey_weight_improvement\",\"dataType\":\"STRING\"}}"}
{  "connector_id": "dashing_overbite",  "created_at": "2022-03-31T18:34:14.412Z",  "message_data": "{\"type\":\"ADD_COLUMN\",\"table\":\"people\",\"properties\":{\"columnName\":\"_2022_q_1_survey_no_improvement\",\"dataType\":\"STRING\"}}"}

As you can see from above, each of the above are different columns added at exactly the same time.

In another example it shows the exact same column added twice at the same time:
```json
{  "connector_id": "doing_dispelled",  "created_at": "2022-04-08T09:42:33.300Z",  "message_data": "{\"type\":\"ADD_COLUMN\",\"table\":\"ad_history\",\"properties\":{\"columnName\":\"placement_specific_instagram_advertising_policies\",\"dataType\":\"STRING\"}}"}
{  "connector_id": "doing_dispelled",  "created_at": "2022-04-08T09:42:33.300Z",  "message_data": "{\"type\":\"ADD_COLUMN\",\"table\":\"ad_history\",\"properties\":{\"columnName\":\"placement_specific_facebook_advertising_policies\",\"dataType\":\"STRING\"}}"}

### Expected behavior

Test doesn't fail

### dbt Project configurations

Only including specifics for Fivetran configuration:
```yaml
name: dbt_elt
version: '1.0.0'
config-version: 2

vars:
    # Fivetran log package configuration
    fivetran_log:
      fivetran_log_database: my_gcp_project
      fivetran_log_schema: fivetran_log
      fivetran_log_using_transformations: false # this will disable all transformation + trigger_table logic
      fivetran_log_using_triggers: false # this will disable only trigger_table logic

Package versions

packages:
  - package: fivetran/stripe_source
    version: [">=0.5.0", "<0.6.0"]
  - package: fivetran/stripe
    version: [">=0.6.0", "<0.7.0"]
  - package: fivetran/fivetran_log
    version: [">=0.5.0", "<0.6.0"]
  - package: fivetran/mailchimp_source
    version: [">=0.3.0", "<0.4.0"]
  - package: fivetran/mailchimp
    version: [">=0.6.0", "<0.7.0"]

What database are you using dbt with?

bigquery

dbt Version

1.0.1

Additional Context

N/A

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 2 years ago

Hi @CraigWilson-ZOE thanks so much for opening this Bug report.

I am actively looking into this now and see what you are referring to that the package is catching these duplicate records. However, upon closer inspection I am wondering if these are in fact duplicates, or the package test are too strict?

I can see within your JSON examples you provided that each of these events seem to have a different message_data field (including the doing_dispelled connector). It may be that the connector has been optimized to sync multiple schema changes at the same time now. I will check in with our product team to gain a better understanding of the behavior of the connector.

I hope to respond back here shortly with an update!

fivetran-joemarkiewicz commented 2 years ago

Hi @CraigWilson-ZOE!

I was able to connect with our connector team and they mentioned that the connector was updated to handle batch processes more efficiently. Therefore, the events are now able to come through at the same time. This is why you are seeing that failure on your end with the schema test.

I have just pushed an update to a working branch that modifies the test to more accurately look at the unique message_data field to capture what are unique events in place of the timestamp. When you have a chance, would you be able to test the following package version and let me know if you see the test succeed?

packages:
  - git: https://github.com/fivetran/dbt_fivetran_log.git
    revision: testfix/adjusted-schema-test
    warn-unpinned: false

Let me know!

CraigWilson-ZOE commented 2 years ago

Hey Joe,

Thanks for getting back to me so quickly.

I have tested out the below but I am still getting an error, here are my logs (I ran this locally) below and let me know if you require anything else.

❯ dbt deps 09:44:50 Running with dbt=1.0.1 09:44:58 Installing calogica/dbt_date 09:44:58 Installed from version 0.5.7 09:44:58 Up to date! 09:44:58 Installing fivetran/mixpanel 09:44:58 Installed from version 0.6.0 09:44:58 Up to date! 09:44:58 Installing calogica/dbt_expectations 09:44:59 Installed from version 0.5.5 09:44:59 Up to date! 09:44:59 Installing dbt-labs/codegen 09:44:59 Installed from version 0.5.0 09:44:59 Updated version available: 0.6.0 09:44:59 Installing data-mie/dbt_profiler 09:45:00 Installed from version 0.3.0 09:45:00 Updated version available: 0.3.2 09:45:00 Installing fivetran/stripe_source 09:45:00 Installed from version 0.5.1 09:45:00 Up to date! 09:45:00 Installing fivetran/stripe 09:45:00 Installed from version 0.6.1 09:45:00 Up to date! 09:45:00 Installing https://github.com/fivetran/dbt_fivetran_log.git 09:45:01 Installed from revision testfix/adjusted-schema-test 09:45:01 Installing fivetran/mailchimp_source 09:45:02 Installed from version 0.3.0 09:45:02 Up to date! 09:45:02 Installing fivetran/mailchimp 09:45:02 Installed from version 0.6.0 09:45:02 Up to date! 09:45:02 Installing dbt-labs/dbt_utils 09:45:02 Installed from version 0.8.4 09:45:02 Up to date! 09:45:02 Installing fivetran/fivetran_utils 09:45:03 Installed from version 0.3.7 09:45:03 Up to date! 09:45:03 Installing dbt-labs/spark_utils 09:45:03 Installed from version 0.3.0 09:45:03 Up to date! 09:45:03 09:45:03 Updates available for packages: ['dbt-labs/codegen', 'data-mie/dbt_profiler'] Update your versions in packages.yml, then run dbt deps

❯ dbt test --select fivetran_log 09:46:19 Running with dbt=1.0.1 09:46:19 Partial parse save file not found. Starting full parse. 09:46:35 Found 451 models, 637 tests, 0 snapshots, 0 analyses, 791 macros, 2 operations, 26 seed files, 388 sources, 0 exposures, 0 metrics 09:46:35 09:46:38 09:46:38 Running 1 on-run-start hook 09:46:39 1 of 1 START hook: dbt_bq_elt.on-run-start.0.................................... [RUN] 09:46:39 1 of 1 OK hook: dbt_bq_elt.on-run-start.0....................................... [OK in 0.37s] 09:46:39 09:46:39 Concurrency: 16 threads (target='development') 09:46:39 09:46:39 4 of 19 START test dbt_utils_unique_combination_of_columns_fivetran_logmar_table_history_connector_iddestination_idtable_namemeasured_month [RUN] 09:46:39 2 of 19 START test dbt_utils_unique_combination_of_columns_fivetran_logconnector_status_connector_iddestination_id [RUN] 09:46:39 3 of 19 START test dbt_utils_unique_combination_of_columns_fivetran_logcredit_mar_destination_history_destination_idmeasured_month [RUN] 09:46:39 1 of 19 START test dbt_utils_unique_combination_of_columns_fivetran_logconnector_daily_events_connector_id__destination_iddate_day [RUN] 09:46:39 5 of 19 START test dbt_utils_unique_combination_of_columns_fivetran_logschema_changelog_connector_id__destination_idmessage_data [RUN] 09:46:39 6 of 19 START test dbt_utils_unique_combination_of_columns_stg_fivetran_logaccount_membership_account_id__user_id [RUN] 09:46:39 7 of 19 START test dbt_utils_unique_combination_of_columns_stg_fivetran_logactive_volume_active_volume_iddestination_id [RUN] 09:46:39 8 of 19 START test dbt_utils_unique_combination_of_columns_stg_fivetran_logconnector_connector_namedestination_id [RUN] 09:46:39 10 of 19 START test dbt_utils_unique_combination_of_columns_stg_fivetran_logdestination_membership_destination_iduser_id [RUN] 09:46:39 11 of 19 START test dbt_utils_unique_combination_of_columns_stg_fivetran_loglog_log_idcreated_at [RUN] 09:46:39 12 of 19 START test not_null_fivetran_log__audit_table_unique_table_sync_key.... [RUN] 09:46:39 13 of 19 START test not_null_stg_fivetran_logaccount_account_id............... [RUN] 09:46:39 14 of 19 START test not_null_stg_fivetran_logdestination_destination_id....... [RUN] 09:46:39 15 of 19 START test not_null_stg_fivetran_loguser_user_id..................... [RUN] 09:46:39 16 of 19 START test unique_fivetran_logaudit_table_unique_table_sync_key...... [RUN] 09:46:39 9 of 19 START test dbt_utils_unique_combination_of_columns_stg_fivetran_log__credits_used_measured_monthdestination_id [RUN] 09:46:40 2 of 19 PASS dbt_utils_unique_combination_of_columns_fivetran_logconnector_status_connector_iddestination_id [PASS in 1.20s] 09:46:40 17 of 19 START test unique_stg_fivetran_logaccount_account_id................. [RUN] 09:46:40 4 of 19 PASS dbt_utils_unique_combination_of_columns_fivetran_logmar_table_history_connector_iddestination_id__table_namemeasured_month [PASS in 1.21s] 09:46:40 18 of 19 START test unique_stg_fivetran_logdestination_destination_id......... [RUN] 09:46:40 6 of 19 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_logaccount_membership_account_iduser_id [PASS in 1.21s] 09:46:40 19 of 19 START test unique_stg_fivetran_loguser_user_id....................... [RUN] 09:46:40 13 of 19 PASS not_null_stg_fivetran_logaccount_account_id..................... [PASS in 1.20s] 09:46:40 5 of 19 FAIL 85 dbt_utils_unique_combination_of_columns_fivetran_log__schema_changelog_connector_iddestination_idmessage_data [FAIL 85 in 1.25s] 09:46:40 1 of 19 PASS dbt_utils_unique_combination_of_columns_fivetran_logconnector_daily_events_connector_iddestination_id__date_day [PASS in 1.28s] 09:46:40 10 of 19 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_logdestination_membership_destination_iduser_id [PASS in 1.25s] 09:46:40 7 of 19 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_logactive_volume_active_volume_iddestination_id [PASS in 1.30s] 09:46:40 9 of 19 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_logcredits_used_measured_monthdestination_id [PASS in 1.28s] 09:46:40 3 of 19 PASS dbt_utils_unique_combination_of_columns_fivetran_logcredit_mar_destination_history_destination_idmeasured_month [PASS in 1.36s] 09:46:40 8 of 19 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_logconnector_connector_namedestination_id [PASS in 1.33s] 09:46:40 12 of 19 PASS not_null_fivetran_log__audit_table_unique_table_sync_key.......... [PASS in 1.37s] 09:46:40 11 of 19 PASS dbt_utils_unique_combination_of_columns_stg_fivetran_loglog_log_idcreated_at [PASS in 1.37s] 09:46:40 15 of 19 PASS not_null_stg_fivetran_loguser_user_id........................... [PASS in 1.37s] 09:46:41 16 of 19 PASS unique_fivetran_logaudit_table_unique_table_sync_key............ [PASS in 1.48s] 09:46:41 14 of 19 PASS not_null_stg_fivetran_logdestination_destination_id............. [PASS in 1.71s] 09:46:41 18 of 19 PASS unique_stg_fivetran_logdestination_destination_id............... [PASS in 0.74s] 09:46:41 19 of 19 PASS unique_stg_fivetran_log__user_user_id............................. [PASS in 0.74s] 09:46:41 17 of 19 PASS unique_stg_fivetran_logaccount_account_id....................... [PASS in 0.91s] 09:46:41 09:46:43 Finished running 19 tests, 2 hooks in 8.36s. 09:46:44 09:46:44 Completed with 1 error and 0 warnings: 09:46:44 09:46:44 Failure in test dbt_utils_unique_combination_of_columns_fivetran_logschema_changelog_connector_id__destination_idmessage_data (models/fivetran_log.yml) 09:46:44 Got 85 results, configured to fail if != 0 09:46:44 09:46:44 compiled SQL at target/compiled/fivetran_log/models/fivetran_log.yml/dbt_utils_unique_combination_o_8e425c10ff34f7bd75936306a47ba8ae.sql 09:46:44 09:46:44 Done. PASS=18 WARN=0 ERROR=1 SKIP=0 TOTAL=19

Thanks,

Craig

On 12 Apr 2022, 20:10 +0100, fivetran/dbt_fivetran_log @.***>, wrote:

fivetran-joemarkiewicz commented 2 years ago

Hi @CraigWilson-ZOE I am sorry to see you are still experiencing this issue.

I actually looked through my changes and saw I removed the created_at from the uniqueness test. This make sense why your test is failing as the same schema change can happen to a connector/destination over the course of time. I just pushed an update that included created_at in the test in addition to message_data.

Would you be able to run dbt deps and try again?

CraigWilson-ZOE commented 2 years ago

Hi Joe,

I just checked this just now and…...

14:00:08 Finished running 19 tests, 2 hooks in 29.75s. 14:00:09 14:00:09 Completed successfully 14:00:09 14:00:09 Done. PASS=19 WARN=0 ERROR=0 SKIP=0 TOTAL=19

So it looks like it all passes now.

Thanks,

Craig

On 13 Apr 2022, 14:55 +0100, Joe Markiewicz @.***>, wrote:

Hi @CraigWilson-ZOE I am sorry to see you are still experiencing this issue. I actually looked through my changes and saw I removed the created_at from the uniqueness test. This make sense why your test is failing as the same schema change can happen to a connector/destination over the course of time. I just pushed an update that included created_at in the test in addition to message_data. Would you be able to run dbt deps and try again? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

simon-stepper commented 2 years ago

Hi @fivetran-joemarkiewicz,

we've just encountered the same error. My solution was to add schema_name and table_name to the unique_combination_of_columns test. I think the outcome is the same and your version resolves the error for us as well. Would love a quick release of this.

JoeMark17 commented 2 years ago

@NoToWarAlways Thanks for letting me know you're experiencing this as well.

I took a similar approach in the patch branch by adding the 'message_data' field to the test.

Unfortunately, I'm on PTO for the rest of this week and won't be able to have these changes reviewed and a release cut until most likely mid next week. In the meantime I would suggest either using the fix branch or excluding this test from your dbt runs until the new release goes live.

I hope this helps

fivetran-joemarkiewicz commented 2 years ago

Hi All,

The above PR is open for review and I hope we will be able to get a release out early this week. In the meantime, feel free to test out the changes within the following branch.

packages:
  - git: https://github.com/fivetran/dbt_fivetran_log.git
    revision: bugfix/transformation-log-and-changelog-test
    warn-unpinned: false

This branch fixes a schema test within the fivetran_log__schema_changelog model as well as filtering out transformation logs within the fivetran_log__connector_status models.

Let me know if you have any questions and I hope we will be able to cut a release with these updates soon!

fivetran-joemarkiewicz commented 2 years ago

Hi All!

I sincerely appreciate your patience while we developed the adjustments within the package to account for the bug mentioned in this ticket. I was just able to cut the latest release of the dbt_fivetran_log v0.5.4 package which applies a bugfix to account for the issue detailed in this ticket.

The v0.5.4 release should be live on the dbt hub at the top of the hour. I will be closing this issue as the latest release should resolve this issue. Please feel free to comment in this issue, or open a new one, if you encounter any questions in the latest release.

Thanks for all your help in identifying and resolving this issue! 🎉