Open edgarrmondragon opened 1 year ago
The finished_at
property was chosen for the runs stream because it is one of the keys that API requests can be ordered by. Unfortunately it looks like the ordering keys are not documented - but one can try hitting the following endpoints:
At our site the first returns a message that provides the required order_by
keys:
{
"status": {
"code": 400,
"is_success": false,
"user_message": "The request was invalid. Please double check the provided data and try again.",
"developer_message": ""
},
"data": {
"reason": "Invalid order_by value. Use one of [id, created_at, finished_at, -id, -created_at, -finished_at] instead."
}
}
Ascending or descending:
Helpful links:
The problem with using created_at
is that the following scenario may occur:
id=1234
is created at 10amid=1235
is created at 10:05amcreated_at
value as 10:05am
id=1234
finishes, the dbt Cloud record is updated with finishing status
, finished_at
etc.created_at
order and stops when it reaches 10:05am
- creating and outputting a final RECORD
message containing id=1235
.id=1234
is not extracted because the created_at
value for that run is 10am, before the bookmark value.I think that makes sense, but please feel free to check my logic.
I was motivated to create an incremental replication method for the runs
endpoint because we have a lot of job runs at our site, however if you have lower volumes of runs, a full_table style replication may be preferable. Is it possible to select that style of replication and override the incremental method?
Just noted the Slack comment said:
Are there any workarounds for this, aside from tweaking our local code to replicate the runs table with full table replication?
Which invalidates the final paragraph of my previous comment
From a Slack conversation:
I need to dig into the API docs to see what's going and maybe come up with a workaround (other than overriding the replication method in the Singer catalog).
Help from other users of this tap is more than welcome!