airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
15.51k stars 3.99k forks source link

Github Connector: schema incorrect for auto_merge #9775

Closed bleonard closed 1 year ago

bleonard commented 2 years ago


Current Behavior

Error during normalization on Github connector. It looks like the schema thinks auto_merge is a boolean, but the data shows it is a JSON object.


2022-01-25 00:05:45 normalization > 2022-01-25 00:05:33.050103 (MainThread): Database Error in model pull_requests (models/generated/airbyte_incremental/public/pull_requests.sql)
2022-01-25 00:05:45 normalization > 2022-01-25 00:05:33.050436 (MainThread):   invalid input syntax for type boolean: "{"enabled_by": {"id": 303226, "url": "", "type": "User", "login": "evantahler", "node_id": "MDQ6VXNlcjMwMzIyNg==", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": ""}, "commit_title": "Remove reEnqueueIfGuidParams (#1372)", "merge_method": "squash", "commit_message": ""}"
2022-01-25 00:05:45 normalization > 2022-01-25 00:05:33.050853 (MainThread):   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_incremental/public/pull_requests.sql
2022-01-25 00:05:45 normalization > 2022-01-25 00:05:33.051878 (MainThread): 

Expected Behavior

No error.

Likely the schema should be something like this:

  "auto_merge": {
    "enabled_by": User,
    "commit_title": string,
    "merge_method": string,
    "commit_message": string


This is the actual contents from the row (dbc7b774-0b8b-4934-9459-3b4b0238bd9c) in _airbyte_raw_pull_requests

LOG Formatted: ```json { "id": 581183204, "url": "", "base": { "ref": "master", "sha": "bcb0f51e81dd5fc310c11e4eede7b1b6929a8629", "repo": { "id": 250316892, "url": "", "fork": false, "name": "grouparoo", "size": 52962, "forks": 53, "owner": { "id": 55261675, "url": "", "type": "Organization", "login": "grouparoo", "node_id": "MDEyOk9yZ2FuaXphdGlvbjU1MjYxNjc1", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": "" }, "topics": [ "apis", "communication", "docker", "email", "events", "grouparoo", "hacktoberfest", "integration-framework", "low-code", "marketing", "marketing-analytics", "marketing-automation", "marketing-operations", "marketing-tools", "nodejs", "push-notifications", "reverse-etl", "self-hosted", "typescript" ], "git_url": "git://", "license": { "key": "other", "url": null, "name": "Other", "node_id": "MDc6TGljZW5zZTA=", "spdx_id": "NOASSERTION" }, "node_id": "MDEwOlJlcG9zaXRvcnkyNTAzMTY4OTI=", "private": false, "ssh_url": "", "svn_url": "", "archived": false, "disabled": false, "has_wiki": false, "homepage": "", "html_url": "", "keys_url": "{/key_id}", "language": "JavaScript", "tags_url": "", "watchers": 545, "blobs_url": "{/sha}", "clone_url": "", "forks_url": "", "full_name": "grouparoo/grouparoo", "has_pages": true, "hooks_url": "", "pulls_url": "{/number}", "pushed_at": "2022-01-24T23:11:43Z", "teams_url": "", "trees_url": "{/sha}", "created_at": "2020-03-26T16:49:01Z", "events_url": "", "has_issues": true, "issues_url": "{/number}", "labels_url": "{/name}", "merges_url": "", "mirror_url": null, "updated_at": "2022-01-24T20:52:22Z", "visibility": "public", "archive_url": "{archive_format}{/ref}", "commits_url": "{/sha}", "compare_url": "{base}...{head}", "description": "🦘 The Grouparoo Monorepo - open source customer data sync framework", "forks_count": 53, "is_template": false, "open_issues": 62, "branches_url": "{/branch}", "comments_url": "{/number}", "contents_url": "{+path}", "git_refs_url": "{/sha}", "git_tags_url": "{/sha}", "has_projects": false, "releases_url": "{/id}", "statuses_url": "{sha}", "allow_forking": true, "assignees_url": "{/user}", "downloads_url": "", "has_downloads": true, "languages_url": "", "default_branch": "main", "milestones_url": "{/number}", "stargazers_url": "", "watchers_count": 545, "deployments_url": "", "git_commits_url": "{/sha}", "subscribers_url": "", "contributors_url": "", "issue_events_url": "{/number}", "stargazers_count": 545, "subscription_url": "", "collaborators_url": "{/collaborator}", "issue_comment_url": "{/number}", "notifications_url": "{?since,all,participating}", "open_issues_count": 62 }, "user": { "id": 55261675, "url": "", "type": "Organization", "login": "grouparoo", "node_id": "MDEyOk9yZ2FuaXphdGlvbjU1MjYxNjc1", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": "" }, "label": "grouparoo:master", "repo_id": null }, "body": "`reEnqueueIfGuidParams` was a utility to aid folks between the `v0.1` and `v0.2` migration so they don't loose any resque tasks that used to use `guid` as we moved to `id`. However, now that our Run system can retry/resume runs, this utility is less important.", "head": { "ref": "reEnqueueIfGuidParams", "sha": "697be95d27d50abda11e67dbffd86b202f07feb0", "user": { "id": 55261675, "url": "", "type": "Organization", "login": "grouparoo", "node_id": "MDEyOk9yZ2FuaXphdGlvbjU1MjYxNjc1", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": "" }, "label": "grouparoo:reEnqueueIfGuidParams", "repo_id": 250316892 }, "user": { "id": 303226, "url": "", "type": "User", "login": "evantahler", "node_id": "MDQ6VXNlcjMwMzIyNg==", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": "" }, "draft": false, "state": "closed", "title": "Remove reEnqueueIfGuidParams", "_links": { "html": { "href": "" }, "self": { "href": "" }, "issue": { "href": "" }, "commits": { "href": "" }, "comments": { "href": "" }, "statuses": { "href": "" }, "review_comment": { "href": "{/number}" }, "review_comments": { "href": "" } }, "labels": [ { "id": 2019522130, "url": "", "name": "internal", "color": "f9d3a4", "default": false, "node_id": "MDU6TGFiZWwyMDE5NTIyMTMw", "description": "chores and housekeeping" } ], "locked": false, "number": 1372, "node_id": "MDExOlB1bGxSZXF1ZXN0NTgxMTgzMjA0", "assignee": null, "diff_url": "", "html_url": "", "assignees": [], "closed_at": "2021-02-26T23:10:45Z", "issue_url": "", "merged_at": "2021-02-26T23:10:45Z", "milestone": null, "patch_url": "", "auto_merge": { "enabled_by": { "id": 303226, "url": "", "type": "User", "login": "evantahler", "node_id": "MDQ6VXNlcjMwMzIyNg==", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": "" }, "commit_title": "Remove reEnqueueIfGuidParams (#1372)", "merge_method": "squash", "commit_message": "" }, "created_at": "2021-02-26T23:02:06Z", "repository": "grouparoo/grouparoo", "updated_at": "2021-02-26T23:10:46Z", "commits_url": "", "comments_url": "", "statuses_url": "", "requested_teams": [], "merge_commit_sha": "85097fd5bcf3219f1faef9563b1d55e0fea5a298", "active_lock_reason": null, "author_association": "MEMBER", "review_comment_url": "{/number}", "requested_reviewers": [], "review_comments_url": "" } ``` unformatted ```txt {"id": 581183204, "url": "", "base": {"ref": "master", "sha": "bcb0f51e81dd5fc310c11e4eede7b1b6929a8629", "repo": {"id": 250316892, "url": "", "fork": false, "name": "grouparoo", "size": 52962, "forks": 53, "owner": {"id": 55261675, "url": "", "type": "Organization", "login": "grouparoo", "node_id": "MDEyOk9yZ2FuaXphdGlvbjU1MjYxNjc1", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": ""}, "topics": ["apis", "communication", "docker", "email", "events", "grouparoo", "hacktoberfest", "integration-framework", "low-code", "marketing", "marketing-analytics", "marketing-automation", "marketing-operations", "marketing-tools", "nodejs", "push-notifications", "reverse-etl", "self-hosted", "typescript"], "git_url": "git://", "license": {"key": "other", "url": null, "name": "Other", "node_id": "MDc6TGljZW5zZTA=", "spdx_id": "NOASSERTION"}, "node_id": "MDEwOlJlcG9zaXRvcnkyNTAzMTY4OTI=", "private": false, "ssh_url": "", "svn_url": "", "archived": false, "disabled": false, "has_wiki": false, "homepage": "", "html_url": "", "keys_url": "{/key_id}", "language": "JavaScript", "tags_url": "", "watchers": 545, "blobs_url": "{/sha}", "clone_url": "", "forks_url": "", "full_name": "grouparoo/grouparoo", "has_pages": true, "hooks_url": "", "pulls_url": "{/number}", "pushed_at": "2022-01-24T23:11:43Z", "teams_url": "", "trees_url": "{/sha}", "created_at": "2020-03-26T16:49:01Z", "events_url": "", "has_issues": true, "issues_url": "{/number}", "labels_url": "{/name}", "merges_url": "", "mirror_url": null, "updated_at": "2022-01-24T20:52:22Z", "visibility": "public", "archive_url": "{archive_format}{/ref}", "commits_url": "{/sha}", "compare_url": "{base}...{head}", "description": "🦘 The Grouparoo Monorepo - open source customer data sync framework", "forks_count": 53, "is_template": false, "open_issues": 62, "branches_url": "{/branch}", "comments_url": "{/number}", "contents_url": "{+path}", "git_refs_url": "{/sha}", "git_tags_url": "{/sha}", "has_projects": false, "releases_url": "{/id}", "statuses_url": "{sha}", "allow_forking": true, "assignees_url": "{/user}", "downloads_url": "", "has_downloads": true, "languages_url": "", "default_branch": "main", "milestones_url": "{/number}", "stargazers_url": "", "watchers_count": 545, "deployments_url": "", "git_commits_url": "{/sha}", "subscribers_url": "", "contributors_url": "", "issue_events_url": "{/number}", "stargazers_count": 545, "subscription_url": "", "collaborators_url": "{/collaborator}", "issue_comment_url": "{/number}", "notifications_url": "{?since,all,participating}", "open_issues_count": 62}, "user": {"id": 55261675, "url": "", "type": "Organization", "login": "grouparoo", "node_id": "MDEyOk9yZ2FuaXphdGlvbjU1MjYxNjc1", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": ""}, "label": "grouparoo:master", "repo_id": null}, "body": "`reEnqueueIfGuidParams` was a utility to aid folks between the `v0.1` and `v0.2` migration so they don't loose any resque tasks that used to use `guid` as we moved to `id`. However, now that our Run system can retry/resume runs, this utility is less important.", "head": {"ref": "reEnqueueIfGuidParams", "sha": "697be95d27d50abda11e67dbffd86b202f07feb0", "user": {"id": 55261675, "url": "", "type": "Organization", "login": "grouparoo", "node_id": "MDEyOk9yZ2FuaXphdGlvbjU1MjYxNjc1", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": ""}, "label": "grouparoo:reEnqueueIfGuidParams", "repo_id": 250316892}, "user": {"id": 303226, "url": "", "type": "User", "login": "evantahler", "node_id": "MDQ6VXNlcjMwMzIyNg==", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": ""}, "draft": false, "state": "closed", "title": "Remove reEnqueueIfGuidParams", "_links": {"html": {"href": ""}, "self": {"href": ""}, "issue": {"href": ""}, "commits": {"href": ""}, "comments": {"href": ""}, "statuses": {"href": ""}, "review_comment": {"href": "{/number}"}, "review_comments": {"href": ""}}, "labels": [{"id": 2019522130, "url": "", "name": "internal", "color": "f9d3a4", "default": false, "node_id": "MDU6TGFiZWwyMDE5NTIyMTMw", "description": "chores and housekeeping"}], "locked": false, "number": 1372, "node_id": "MDExOlB1bGxSZXF1ZXN0NTgxMTgzMjA0", "assignee": null, "diff_url": "", "html_url": "", "assignees": [], "closed_at": "2021-02-26T23:10:45Z", "issue_url": "", "merged_at": "2021-02-26T23:10:45Z", "milestone": null, "patch_url": "", "auto_merge": {"enabled_by": {"id": 303226, "url": "", "type": "User", "login": "evantahler", "node_id": "MDQ6VXNlcjMwMzIyNg==", "html_url": "", "gists_url": "{/gist_id}", "repos_url": "", "avatar_url": "", "events_url": "{/privacy}", "site_admin": false, "gravatar_id": "", "starred_url": "{/owner}{/repo}", "followers_url": "", "following_url": "{/other_user}", "organizations_url": "", "subscriptions_url": "", "received_events_url": ""}, "commit_title": "Remove reEnqueueIfGuidParams (#1372)", "merge_method": "squash", "commit_message": ""}, "created_at": "2021-02-26T23:02:06Z", "repository": "grouparoo/grouparoo", "updated_at": "2021-02-26T23:10:46Z", "commits_url": "", "comments_url": "", "statuses_url": "", "requested_teams": [], "merge_commit_sha": "85097fd5bcf3219f1faef9563b1d55e0fea5a298", "active_lock_reason": null, "author_association": "MEMBER", "review_comment_url": "{/number}", "requested_reviewers": [], "review_comments_url": ""} ```

Steps to Reproduce

  1. Set up sync from a repo that has done an auto_merge PR before
  2. Sync!

Are you willing to submit a PR?

Generally, but thought I'd report first and see.

marcosmarxm commented 1 year ago

Solved by #9802