actions / runner-images

GitHub Actions runner images
MIT License
9.9k stars 3.01k forks source link

NDK versions are being removed without grace period #10599

Open thomaseizinger opened 1 week ago

thomaseizinger commented 1 week ago

Description

We build an Android app against the NDK version installed on the GitHub runners. During updates to the runner images, the NDK appears to get bumped and the previous default is no longer available, causing builds to fail.

We cannot control, which image version gets used so our CI is effectively blocked.

Platforms affected

Runner images affected

Image version and build link

https://github.com/firezone/firezone/actions/runs/10804517190/job/29970095787?pr=6564

Is it regression?

Yes, the latest non-prerelease doesn't have the issue.

Expected behavior

The previous default NDK version to not be removed without a grace-period.

Actual behavior

The default NDK version changes and the old one is no longer available.

Repro steps

  1. Create an Android app and build against the current NDK version.
  2. GitHub decides to randomly(?) use "pre-release" images for some CI runs, failing the pipeline.
hemanthmanga commented 1 week ago

Hi @thomaseizinger Thank you for bringing this issue to us. We are looking into this issue and will update you on this issue after investigating.

kishorekumar-anchala commented 1 week ago

Hi @thomaseizinger ,

We created announcement last month about removal of old NDK versions, please find the announcement. thank you !

thomaseizinger commented 1 week ago

That is not the issue. We were already building against NDK 27.0.12077973.

The problem is that we can only ever build against one NDK version and the latest image update removed NDK version 27.0.12077973 and instead installed 27.1.12297006.

This update appears to have been rolled out incrementally because we saw some CI builds failing and some passing.

This is the fix we had to make: https://github.com/firezone/firezone/pull/6662. But initially, this PR also didn't pass CI reliably because the update was not yet rolled out to all runners.

We don't have any ability to specify, which runner image version we get which essentially leaves us in a broken state: We can only rerun failed CI builds in the hope that we get an old image version with the previous NDK and eventually merge the PR where we use the newer NDK.

Instead of removing the old NDK, can you first add the new NDK to an image release? That would give us time to migrate to the new version. Subsequently, you can then remove the previous version without breaking CI.

kishorekumar-anchala commented 2 days ago

HI @thomaseizinger ,

That is not the issue. We were already building against NDK 27.0.12077973.

The problem is that we can only ever build against one NDK version and the latest image update removed NDK version 27.0.12077973 and instead installed 27.1.12297006.

Yes, for every rollout new version will be automatically fetched if it is exist .

Instead of removing the old NDK, can you first add the new NDK to an image release? That would give us time to migrate to the new version. Subsequently, you can then remove the previous version without breaking CI.

yes , before deleting any major versions we will raise an announcement as mentioned above .

When coming to minor and hotfix versions will automatically fetched available versions as per script , if new versions available that will be coming with new rollout .

Thank you ! , hoping you build got succeed with latest release , kindly provide your confirmation on it .

thomaseizinger commented 2 days ago

Instead of removing the old NDK, can you first add the new NDK to an image release? That would give us time to migrate to the new version. Subsequently, you can then remove the previous version without breaking CI.

yes , before deleting any major versions we will raise an announcement as mentioned above .

It is nice that you make an announcement that CI will be broken for a couple of days. It would be better if you wouldn't break CI for a couple of days.

The issue is that you can only build an Android App against a single, specific NDK version. Your rollout seems to be incremental, which makes sense. But it means that during the rollout, it is a lottery, whether we are getting a version with the new or the old NDK version, so we are in the following scenario:

  1. main references version 27.0.12077973. Some CI runs will pass because they run on machines that haven't upgraded yet.
  2. As the rollout progressed, more and more CI runs will fail because the NDK version doesn't exist.
  3. At some point in the rollout, we have to force-merge a PR that changes the NDK version 27.1.12297006.
  4. Now, CI runs that are given the new NDK version will pass.
  5. Still, there will be CI runs on runners that are still using the old version and those will fail.

This is a huge service disruption because we cannot merge PRs reliably during this period, see:

image

When coming to minor and hotfix versions will automatically fetched available versions as per script , if new versions available that will be coming with new rollout .

Instead of replacing the version, can you fetch two versions? The one previously installed and whatever is the latest at the time? That way, the rollout doesn't break CI and we can update to the new NDK version after the rollout is completed.

kishorekumar-anchala commented 1 day ago

The issue is that you can only build an Android App against a single, specific NDK version. Your rollout seems to be incremental, which makes sense. But it means that during the rollout, it is a lottery, whether we are getting a version with the new or the old NDK version, so we are in the following scenario:

During the rollout the the existing version will not be changed. the new NDK version available once image rollout completed in to both GitHub Runners and Hosted agents.

Instead of replacing the version, can you fetch two versions? The one previously installed and whatever is the latest at the time? That way, the rollout doesn't break CI and we can update to the new NDK version after the rollout is completed.

Currently, we do not have plans to fetch both the previously installed and the latest NDK versions. we confirm that ubuntu images have latest NDK version . i hope you're CIs ran successfully .

thomaseizinger commented 1 day ago

The issue is that you can only build an Android App against a single, specific NDK version. Your rollout seems to be incremental, which makes sense. But it means that during the rollout, it is a lottery, whether we are getting a version with the new or the old NDK version, so we are in the following scenario:

During the rollout the the existing version will not be changed. the new NDK version available once image rollout completed in to both GitHub Runners and Hosted agents.

That is not true in our experience. We have seen several CI builds that use a "pre-release" image which will have a newer NDK version and thus fail.

Instead of replacing the version, can you fetch two versions? The one previously installed and whatever is the latest at the time? That way, the rollout doesn't break CI and we can update to the new NDK version after the rollout is completed.

Currently, we do not have plans to fetch both the previously installed and the latest NDK versions.

What is your suggestion then to implement reliable CI for Android apps that use the NDK?

kishorekumar-anchala commented 7 hours ago

Hi @thomaseizinger ,

Pin your NDK version in your CI configuration. This ensures that the build environment remains consistent and reduces the risk of unexpected breaks due to NDK updates.

thomaseizinger commented 6 hours ago

I can do that yeah. I am not sure what the point of the pre-installed NDK is then if your advice is to install a separate version?