apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.57k stars 514 forks source link

[Bug][Bitbucket Server] Activities collector fails when PR is corrupt #7601

Closed sstojak1 closed 3 months ago

sstojak1 commented 3 months ago

Search before asking

What happened

The Bitbucket Server data collection task fails because one pull request is corrupted. Error that Devlake throws: | Retry exceeded 3 times calling rest/api/1.0/projects/{projectKey}/repos/{repoName}/pull-requests/{pullRequestId}/activities. The last error was: Http DoAsync error calling [method:GET path:rest/api/1.0/projects/{projectKey}/repos/{repoName}/pull-requests/{pullRequestId}/activities query:map[limit:[100] state:[all]]]. Response: {"errors":[{"context":null,"message":"'git update-ref --stdin -z --no-deref' exited with code 128 saying: fatal: cannot update ref 'stash-refs/pull-requests/{pullRequestId}/from': trying to write ref 'stash-refs/pull-requests/{pullRequestId}/from' with nonexistent object {commitSHA}","exceptionName":"com.atlassian.bitbucket.scm.CommandFailedException"}]} (500)

What do you expect to happen

I think it would make sense for the data collection to continue even if one pull request is corrupted since we don't want to lose all of those Bitbucket metrics for the repository.

How to reproduce

We have this kind of a state in our env. Not sure how to reproduce.

Anything else

No response

Version

v1.0.0-beta11

Are you willing to submit PR?

Code of Conduct

sstojak1 commented 3 months ago

this is similar to 7258

dosubot[bot] commented 3 months ago

To ensure that the Bitbucket Server data collection task in DevLake continues even if one pull request is corrupted, you can configure the Sync Policy to "Skip Failed Tasks." This setting allows the data collection process to skip over any failed tasks, preventing the entire pipeline from failing and ensuring that you do not lose all Bitbucket metrics for the repository.

Here is the relevant section from the document:

### Step 2.3 - Set the Sync Policy
There are three settings for Sync Policy:
- Data Time Range: You can select the time range of the data you wish to collect. The default is set to the past six months.
- Sync Frequency: You can choose how often you would like to sync your data in this step by selecting a sync frequency option or enter a cron code to specify your preferred schedule.
- Skip Failed Tasks: sometime a few tasks may fail in a long pipeline; you can choose to skip them to avoid spending more time in running the pipeline all over again.

Additionally, the ConvertPullRequests function in the pr_convertor.go file is designed to handle errors gracefully. The api.NewDataConverter method in the ConvertPullRequests function is set up to manage errors effectively, allowing the data collection process to continue even if some pull requests are not found or are corrupted.

By enabling the "Skip Failed Tasks" option and leveraging the error handling in the ConvertPullRequests function, you can ensure that the data collection process continues even if some tasks fail [1][2].

To continue talking to Dosu, mention @dosu.

d4x1 commented 3 months ago

@sstojak1 There is an option in project's sync config, you can have a try. image

image

klesh commented 3 months ago

I believe it is fixed already, will be available in the coming rc version which should be ready in a couple of days.

sstojak1 commented 3 months ago

@d4x1 This option is already on for all our projects. Here a single task is falling since one PR is corrupt in Bitbucket. As a result, other repository information won't be collected.

@klesh Are you referring to 7577? 7577 is connected with PR commits and it's handling 40X error status. This error is with 500 status and it's connected with PR activities. If it's ok, we can do it like @abeizn did for commits but handle 500 + body message - com.atlassian.bitbucket.scm.CommandFailedException? What do you think? image

klesh commented 3 months ago

Ahh.. 500 errors? I am not sure, 500 represents Server Internal Errors, It might suggest that the server is corrupted or down, in this case, it is hard to say if it is appropriate to skip the PR. It makes more sense to fix the 500 errors on the bitbucket server rather than ignoring them on the devlake end.

sstojak1 commented 3 months ago

You're correct. Deciding whether to skip something based on the message content will be challenging. Resolving the ticket...