apache / kafka

Mirror of Apache Kafka
Apache License 2.0
28.39k stars 13.84k forks source link

MINOR Fix build scan status on PRs #17155

Closed mumrah closed 1 week ago

mumrah commented 1 week ago

Set statuses on apache/kafka repo, not the head repo of the PR

mumrah commented 1 week ago

This was pretty confusing. In order to set statuses, you refer to a SHA. The commit we're adding a status to is in another repository, but we are actually able to set a status for it in apache/kafka. 🤷‍♂️


I tested this locally using my personal token which has similar permissions to the github token used in Actions.

Here's a commit I tested:

Original commit: https://github.com/chirag-wadhwa5/kafka/commit/70ced5109a5a73bd2dc66d11dff528d384f638c7 PR: https://github.com/apache/kafka/pull/16916

I copied the gh api from the failed CI Complete and replaced "chirag-wadhwa5/kafka" with "apache/kafka".

gh api --method POST -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" \
/repos/apache/kafka/statuses/70ced5109a5a73bd2dc66d11dff528d384f638c7 \
-f "state=success" -f "target_url=https://ge.apache.org/s/tt4dyzrxp3dm4" \
-f "description=The build scan was successfully published" \
-f "context=Gradle Build Scan / Java 17"

and it worked!

image

So, all along we just needed to send the API request to apache/kafka instead of the forked repo.

chia7712 commented 1 week ago

https://ge.apache.org/s/tt4dyzrxp3dm4

It shows "2 task failures" and that is caused by the cached error (produced by flaky). It is a bit weird to see that "false" failures, but not sure whether it is worth having fix.

mumrah commented 1 week ago

It shows "2 task failures" and that is caused by the cached error (produced by flaky)

That is unexpected 🤔. I guess it might have to do with how I uploaded them?

Here's a similar trunk build with flaky failures, but it is not showing as failed https://ge.apache.org/s/cd4s4wd6knaum

chia7712 commented 1 week ago

That is unexpected 🤔. I guess it might have to do with how I uploaded them? Here's a similar trunk build with flaky failures, but it is not showing as failed https://ge.apache.org/s/cd4s4wd6knaum

Let's merge this PR first and keep eyes on it :smiley:

mumrah commented 1 week ago

@chia7712 I think I found the issue. Here's an example:

green PR: https://github.com/apache/kafka/pull/17138 green CI: https://github.com/apache/kafka/actions/runs/10795610027/job/29944644706?pr=17138 red Build Scan: https://ge.apache.org/s/b4qg5njxvfnpe

In order to prevent Gradle from caching flaky test results, we force the exit code to be 1 if running in github actions. See https://github.com/apache/kafka/blob/trunk/build.gradle#L526-L531.

We ignore this exit code in the GH workflow and rely on "Parse JUnit Tests" to fail or succeed depending on the test failures.

The result is that the build scan will show a task failure if any tests failed (even if they succeeded on retry).

PR with zero test failures:

PR with only flaky tests:

PR with some failed tests:

I think this is probably as good as we can do for now. Maybe someday we can remove the test retrying.

chia7712 commented 1 week ago

@mumrah thanks for the nice explanation!

I think this is probably as good as we can do for now. Maybe someday we can remove the test retrying.

yes, removing the "retry" is for us own good since it enforces us to "face" the problem. Also, there are only few flaky after migrating to github action so "retry" is unnecessary to me ...