apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.57k stars 516 forks source link

[Bug][Github] Error inserting raw rows into _raw_github_api_pull_request_commits (500) #6344

Closed mkaufmaner closed 3 weeks ago

mkaufmaner commented 11 months ago

Search before asking

What happened

The GitHub plugin pipeline task for a GitHub repository with;

Full Logs (Sanitized): task-129911-2-1-github.log.zip

Simplified Task JSON:

{
    "id": 125907,
    "createdAt": "2023-10-25T19:15:17.77Z",
    "updatedAt": "2023-10-25T19:19:15.662Z",
    "plugin": "github",
    "subtasks": [
        "collectApiPullRequests",
        "extractApiPullRequests",
        "collectApiComments",
        "extractApiComments",
        "collectApiPullRequestCommits",
        "extractApiPullRequestCommits",
        "collectApiPullRequestReviews",
        "extractApiPullRequestReviews",
        "collectApiPrReviewCommentsMeta",
        "extractApiPrReviewComments",
        "collectAccounts",
        "extractAccounts",
        "collectAccountOrg",
        "ExtractAccountOrg",
        "enrichPullRequestIssues",
        "convertRepo",
        "convertPullRequestCommits",
        "convertPullRequests",
        "convertPullRequestReviews",
        "convertPullRequestLabels",
        "convertPullRequestIssues",
        "convertPullRequestComments",
        "convertAccounts"
    ],
    "options": "{\"connectionId\":5,\"githubId\":509,\"name\":\"xxxx/xxxx-xxxx-xxx-xxxxxx-xxx\",\"timeAfter\":\"2023-04-24T00:00:00-04:00\"}",
    "status": "TASK_FAILED",
    "message": "subtask collectApiPullRequestCommits ended unexpectedly\nWraps: (2)\n  | combined messages: \n  | {\n  | \terror inserting raw rows into _raw_github_api_pull_request_commits (500)\n  | \t=====================\n [...] \terror inserting raw rows into _raw_github_api_pull_request_commits (500)\n  | }\nError types: (1) *hintdetail.withDetail (2) *errors.errorString",
    "errorName": "subtask collectApiPullRequestCommits ended unexpectedly\ncaused by: error inserting raw rows into _raw_github_api_pull_request_commits (500), [...] error inserting raw rows into _raw_github_api_pull_request_commits (500)"
    "progress": 0.17391305,
    "progressDetail": null,
    "failedSubTask": "collectApiPullRequestCommits",
    "pipelineId": 57,
    "pipelineRow": 6,
    "pipelineCol": 14,
    "beganAt": "2023-10-25T19:15:18.344Z",
    "finishedAt": "2023-10-25T19:19:15.655Z",
    "spentSeconds": 237
}

Full Task JSON (1.2MB): devlake-github-500.json

Screenshot: devlake-github-500-errors

What do you expect to happen

I expected this task to be successful.

How to reproduce

This is a good question, I am guessing with a repository with a significant amount of PRs and commits.

Anything else

Possibly related to https://github.com/apache/incubator-devlake/issues/6320

Version

v0.18.0

Are you willing to submit PR?

Code of Conduct

d4x1 commented 11 months ago

Are error messages when you "hover to view the reason" the same with devlake-github-500.json?

Because I cann't find useful info from the json file(not your problem, the error messages are too mess). @mkaufmaner

mkaufmaner commented 11 months ago

Are error messages when you "hover to view the reason" the same with devlake-github-500.json?

Because I cann't find useful info from the json file(not your problem, the error messages are too mess).

@mkaufmaner

Yes, they are.

I am trying to get the logs but when the pod restarted with the updated nginx configuration the logs got blown away. Rerunning the task now to get the logs. Will probably take a few minutes.

mkaufmaner commented 11 months ago

@d4x1 Updated the main bug report with the full log for the failed github task. Uncompressed they are 32.7MB. See https://github.com/apache/incubator-devlake/files/13178290/task-129911-2-1-github.log.zip

mkaufmaner commented 11 months ago

@d4x1 Upon further investigation, it appears to be due to a resource limitation on our GHE server and the requests are timing out. Wish we could get graphql working, which we may get working after upgrading our GHE to the latest stable version of 3.10, but that is TBD.

Side note; is there a client side request timeout I should be worried about? I tried looking through the source code and documentation but couldn't find anything.

klesh commented 11 months ago

The default timeout of GitHub graphql is 30s, but you can change it by setting an Env Var named API_TIMEOUT, keep in mind the s part is required, you can specify s for seconds, m for minutes and h for hours.

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 5 months ago

This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

klesh commented 3 weeks ago

Closed due to inactivity,