airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.18k stars 4.14k forks source link

Source github: don't minify the `users` field in all any streams #4752

Closed sherifnada closed 3 years ago

sherifnada commented 3 years ago

Tell us about the problem you're trying to solve

It would be nice to include the rest of the user information returned by the GitHub API beyond just the user_id. On all streams, but particularly pull requests and commits. The API endpoint docs are here: https://docs.github.com/en/rest/reference/pulls And here is an example of the raw data returned by the connector:

{ "url":"https://api.github.com/repos/airbytehq/airbyte/pulls/4646", "id":686552355, "node_id":"MDExOlB1bGxSZXF1ZXN0Njg2NTUyMzU1", "html_url":"https://github.com/airbytehq/airbyte/pull/4646", "diff_url":"https://github.com/airbytehq/airbyte/pull/4646.diff", "patch_url":"https://github.com/airbytehq/airbyte/pull/4646.patch", "issue_url":"https://api.github.com/repos/airbytehq/airbyte/issues/4646", "number":4646, "state":"closed", "locked":false, "title":"0.27.1 Connector Patch Notes", "body":"## Main Changes\r\n- Adds Connector changelog for the 0.27.1 patch\r\n\r\n## Misc Changes\r\n- Fixes naming for SurveyMonkey and CockroachDB (capitalization matters!)\r\n- Reorganized CockroachDB in the integrations list to be in alphabetical order... very important.", "created_at":"2021-07-09T07:14:39Z", "updated_at":"2021-07-09T07:15:29Z", "closed_at":"2021-07-09T07:15:28Z", "merged_at":"2021-07-09T07:15:28Z", "merge_commit_sha":"15971e89b1fb623a006a98489d8aa48cb2de2956", "draft":false, "commits_url":"https://api.github.com/repos/airbytehq/airbyte/pulls/4646/commits", "review_comments_url":"https://api.github.com/repos/airbytehq/airbyte/pulls/4646/comments", "review_comment_url":"https://api.github.com/repos/airbytehq/airbyte/pulls/comments{/number}", "comments_url":"https://api.github.com/repos/airbytehq/airbyte/issues/4646/comments", "statuses_url":"https://api.github.com/repos/airbytehq/airbyte/statuses/0becef350a3da69215644fe38fb2bcd17a32d738", "head":{ "label":"airbytehq:abhi/indras-net", "ref":"abhi/indras-net", "sha":"0becef350a3da69215644fe38fb2bcd17a32d738", "user_id":59758427, "repo_id":283046497 }, "base":{ "label":"airbytehq:master", "ref":"master", "sha":"db223a4d068b793d0cb054b7fc671b9dc108bfe0", "user_id":59758427, "repo_id":283046497 }, "_links":{ "self":{ "href":"https://api.github.com/repos/airbytehq/airbyte/pulls/4646" }, "html":{ "href":"https://github.com/airbytehq/airbyte/pull/4646" }, "issue":{ "href":"https://api.github.com/repos/airbytehq/airbyte/issues/4646" }, "comments":{ "href":"https://api.github.com/repos/airbytehq/airbyte/issues/4646/comments" }, "review_comments":{ "href":"https://api.github.com/repos/airbytehq/airbyte/pulls/4646/comments" }, "review_comment":{ "href":"https://api.github.com/repos/airbytehq/airbyte/pulls/comments{/number}" }, "commits":{ "href":"https://api.github.com/repos/airbytehq/airbyte/pulls/4646/commits" }, "statuses":{ "href":"https://api.github.com/repos/airbytehq/airbyte/statuses/0becef350a3da69215644fe38fb2bcd17a32d738" } }, "author_association":"CONTRIBUTOR", "auto_merge":null, "active_lock_reason":null, "user_id":33042053, "milestone":null, "assignee":null, "labels":[ 2235194062 ], "assignees":[ ], "requested_reviewers":[ ], "requested_teams":[ ], "_ab_github_repository":"airbytehq/airbyte" }

Describe the solution you’d like

Include the full user object

Describe the alternative you’ve considered or used

Zirochkaa commented 3 years ago

Should it be user object (and similar to it like author, actor, creator, committer, assignee, assignees, requested_reviewers - they all are the same object) for all streams? Or just for pull_requests stream?

garden-of-delete commented 3 years ago

I originated this issue. Ideally, we would include all the data for the user object and similar objects for all streams that provide it. Reason being perhaps you have a contributor who makes issues but not PRs, or the other way around. Would be nice to be able to get info for that user beyond the user_id.

(Also huge thanks for giving the github native connector so much love recently)