MeltanoLabs / tap-github

A Singer tap for extracting data from Github. Powered by the Meltano SDK for Singer Taps: https://sdk.meltano.com
Apache License 2.0
18 stars 28 forks source link

Workflow streams incorrectly claim to support incremental loading #216

Open JohannesRudolph opened 1 year ago

JohannesRudolph commented 1 year ago

So most GitHubRestStream descendants in the tap support incremental loading using a combination of updated_at replication key and GH APIs since parameter, e.g. repository issues https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues

None of GitHub's APIs used for workflow, workflow_runs and workflow_run_jobs streams however supports those parameters, see e.g. https://docs.github.com/en/rest/actions/workflows?apiVersion=2022-11-28#list-repository-workflows

Nonetheless, the tap sets replication keys accordingly and creates huge state files (esp. for workflow_run_jobs) where every run_id seems to get its own partition.

In my pipelines this results in append only behavior where instead I should probably do full loads instead.

A possible solution here might be to use the use_fake_since_parameter but I haven't checked this yet and would appreciate if one of the experts of this tap could offer an insight

ranpa commented 2 weeks ago

Hey @JohannesRudolph! How are you?

Were you able to sort this out? My team has just faced this issue this week and we were wondering whether we would need to try fixing it ourselves or try another solution.

I would really appreciate any update on this.

Thank you!

JohannesRudolph commented 2 weeks ago

Not really. My workaround was to put it into a target that will always replace all data, since the stream is not incremental. Viele Grüße,JohannesAm 09.10.2024 um 10:54 schrieb Nélson Rangel @.***>: Hey @JohannesRudolph! How are you? Were you able to sort this out? My team has just faced this issue this week and we were wondering whether we would need to try fixing it ourselves or try another solution. I would really appreciate any update on this. Thank you!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>