fivetran / dbt_fivetran_log

Data models for Fivetran's internal log connector built using dbt.
https://fivetran.github.io/dbt_fivetran_log/
Apache License 2.0
30 stars 24 forks source link

[Bug] Broken connector reported as "connected" #39

Closed carlioth closed 2 years ago

carlioth commented 2 years ago

Is there an existing issue for this?

Describe the issue

In the Fivetran UI we can see that we have a connector (Salesforce sandbox) that is broken: Fivetran-ui-broken

We also have our own System status dashboard where we rely on the data coming out of the models in this dbt package. The issue here is that it's showing that this connector is healthy, described as "connected"

row-model-connected

I quickly started to dig into what might be going wrong, and perhaps here is something. If I run:

    select *
    from "DB"."SCHEMA"."stg_fivetran_log__log"

    where event_type = 'SEVERE'
        or event_subtype like 'sync%'
        and connector_id = 'crusade_plausible'
        order by CREATED_AT desc

I'm getting a sync_start then a Severe log, and then a sync_end log.

rows-from-stg-fivetran-log-log

When this later is categorised, this might be where the issue is presented?

https://github.com/fivetran/dbt_fivetran_log/blob/691b82a06eb22a940b628dbac428b2f16b1ae9d8/models/fivetran_log__connector_status.sql#L83-L101

For us this is a high priority issue since we rely on the data from our System status dashboard where we monitor all our Fivetran connectors.

Relevant error log or model output

No response

Expected behavior

We expect the model fivetran_log__connector_status to report the connector as broken if it's broken in the Fivetran UI.

dbt Project configurations

models: +persist_docs: relation: true columns: true data_eng: sources: +materialized: table fivetran_log: +schema: # leave these blank to use the target_schema staging: +schema: # leave these blank to use the target_schema

vars: fivetran_log: fivetran_log_database: db fivetran_log_schema: schema

Package versions

packages:

What database are you using dbt with?

snowflake

dbt Version

Dbt version 1.0.0

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 2 years ago

Hi @carlioth thanks so much for opening this issue and providing such detailed notes on your investigation.

Taking a look at what you provided above, I would agree that this connector should be showing as broken and not as connected. Based off the staging query you have above, I would have thought the the below line would have captured the SEVERE event and logged an error time of 2022-02-25 06:26:08.978. https://github.com/fivetran/dbt_fivetran_log/blob/691b82a06eb22a940b628dbac428b2f16b1ae9d8/models/fivetran_log__connector_status.sql#L67

However, after looking further into this I can see that the logic is in fact working but not in the way we would like. We have made the assumption that a broken connector would not have a sync_end event. This in fact seems to not be the case as I can see the SEVERE record and then a subsequent sync_end event. This sync_end event is then negating the last_error_at field due to the below line. https://github.com/fivetran/dbt_fivetran_log/blob/691b82a06eb22a940b628dbac428b2f16b1ae9d8/models/fivetran_log__connector_status.sql#L98

For your data, since the last_error_at is technically less than the last_sync_completed_at field, this is then not recorded as a broken.

Before we take any next steps, I would like to understand better why this had a sync_end event following a failure. I will follow up with our engineering team to get a better understanding of this. Further, would you be able to share the entire contents of the JSON object that includes the SEVERE warning you provided in the screenshot above?

image
carlioth commented 2 years ago

Hi @fivetran-joemarkiewicz Thanks for the quick respons and good details.

The full log for the SEVERE says: {"reason":"java.lang.Exception: Authentication failure. Reconnect the connector with the latest username and password","taskType":"reconnect","status":"FAILURE_WITH_TASK"}

For full transparency in my query above I've removed all the logs with WARNING as eventtype, they are also included in the model fivetran_log__connector_status. The reason why I've removed those events is because we are right now drowning in those events. We are approx. getting 60 of these warnings per second: {"type":"table_excluded_by_system","message":"salesforce_icrm_prod.<TABLE> has been Excluded by system. Reason : Not queryable"}

carlioth commented 2 years ago

Update: We are now seeing the same behaviour but for another connector. This time the error is: {"reason":"com.fivetran.core.PrimaryKeyContainsNull: Null primary key found while syncing table ***** Looking at the logs we are getting the same once, first sync_start , then the Severe log, and then sync_end

fivetran-joemarkiewicz commented 2 years ago

Thanks for these detailed updates! I am still looking into this, but hopefully will come to a conclusion soon. I will keep you updated on my end!

carlioth commented 2 years ago

Any updates on this @fivetran-joemarkiewicz

fivetran-joemarkiewicz commented 2 years ago

Hi @carlioth, I apologize but I do not have a strong update at the time being.

The last movement on my end was working with the product manager for the Fivetran Log connector who believe sync_end events should exist for all connectors (even broken ones). If this is the case, then my team and I will want to update the broken status logic in this package accordingly.

However, the PM was not 100% and as of Friday was looping up with the engineering team to confirm this. I hope to have an update this week!

andersrundberg commented 2 years ago

Hi @fivetran-joemarkiewicz, any updates on this issue?

fivetran-joemarkiewicz commented 2 years ago

Hi @andersrundberg thanks for reaching out! I have not been able to connect with our engineering team on this at the moment to verify, but I am reasonably certain that via the connector December 2021 release notes we are going to want to update the logic within our dbt package to account for sync_end events for failed connectors since they now should register this log regardless of success or failure.

That being said, we will want to update this on our end within the dbt package to reflect the current state of the log connector. I will make an update this week in a working branch and share it here for you to test out before we roll out any updates in the next release.

Thank you so much for your patience!

andersrundberg commented 2 years ago

@fivetran-joemarkiewicz cool, please tag @carlioth when there any news.

fivetran-joemarkiewicz commented 2 years ago

Hi @andersrundberg and @carlioth

Thank you again for your patience and helping us identify this issue within the package. I am currently working on a fix and believe to be on the right track. When you have availability, would you be able to test the below version of the package within your dbt project and see if it is able to identify the broken/paused/working connectors properly?

packages:
    - git: https://github.com/fivetran/dbt_fivetran_log.git
      revision: bugfix/connector-status
      warn-unpinned: false

Let me know if you have any questions and if you do or do not see the issue be resolved within this branch of the package.

Thanks!

carlioth commented 2 years ago

Hi @fivetran-joemarkiewicz I've now tested this and now I'm getting the status of broken for the broken connectors. Working as expected šŸ‘šŸ» Do you have any ETA when you think you will be able to release this fix?

fivetran-joemarkiewicz commented 2 years ago

That's great to hear! I still want to make a few minor changes, but will be opening this issue up for a PR review today and hopes to have the fix released tomorrow!

Edit: We typically do release freezes on Friday afternoons. Because of that, a more realistic timeline would be a Monday release.

fivetran-joemarkiewicz commented 2 years ago

Hi All,

I just wanted to share that the PR has been merged with the fix and the new v0.5.3 release has been cut! You should be seeing the latest version of the package to be live on the dbt hub at the top of the hour.

Feel free to create a new issue if you encounter any questions while using the Fivetran Log package. Thanks again for your help in raising and resolving this issue.