MeltanoLabs / tap-dbt

Singer Tap for dbt API v2 built with the Meltano SDK
https://pypi.org/p/tap-dbt
Apache License 2.0
12 stars 7 forks source link

Replication key field `finished_at` of `runs` stream can sometimes be null #213

Open edgarrmondragon opened 1 year ago

edgarrmondragon commented 1 year ago

From a Slack conversation:

2. Incremental replication where the replication key is sometimes null: Again in tap-dbt, the runs stream is set to replicate incrementally using finished_at as the replication key. However this field is sometimes NULL for our runs. Are there any workarounds for this, aside from tweaking our local code to replicate the runs table with full table replication?

I need to dig into the API docs to see what's going and maybe come up with a workaround (other than overriding the replication method in the Singer catalog).

Help from other users of this tap is more than welcome!


mjsqu commented 1 year ago

The finished_at property was chosen for the runs stream because it is one of the keys that API requests can be ordered by. Unfortunately it looks like the ordering keys are not documented - but one can try hitting the following endpoints:

At our site the first returns a message that provides the required order_by keys:

{
    "status": {
        "code": 400,
        "is_success": false,
        "user_message": "The request was invalid. Please double check the provided data and try again.",
        "developer_message": ""
    },
    "data": {
        "reason": "Invalid order_by value. Use one of [id, created_at, finished_at, -id, -created_at, -finished_at] instead."
    }
}

Ascending or descending:

Helpful links:

mjsqu commented 1 year ago

The problem with using created_at is that the following scenario may occur:

I think that makes sense, but please feel free to check my logic.

I was motivated to create an incremental replication method for the runs endpoint because we have a lot of job runs at our site, however if you have lower volumes of runs, a full_table style replication may be preferable. Is it possible to select that style of replication and override the incremental method?

mjsqu commented 1 year ago

Just noted the Slack comment said:

Are there any workarounds for this, aside from tweaking our local code to replicate the runs table with full table replication?

Which invalidates the final paragraph of my previous comment