cirros-dev / cirros

120 stars 33 forks source link

Drop Travis CI and replace with GitHub Actions #78

Closed NucciTheBoss closed 2 years ago

NucciTheBoss commented 2 years ago

Hello there! I noticed that there have been some outstanding issues/pull requests on this repository. I myself have been trying to navigate around #77 recently for running OpenStack functional tests with Tempest. I saw that the major block right now is issue #76 which is that CirrOS does not have its original pipeline anymore. This pull request aims to add new GitHub Actions pipelines so that we can start landing updates on CirrOS again.

build-cirros-bionic.yaml

This pipeline builds CirrOS images for aarch64, arm, i386, x86_64, powerpc, ppc64, and ppc64le in parallel, and then uploads the build artifacts to the workspace. These artifacts can be downloaded and the images can be tested further before release.

test-cirros-bionic.yaml

This pipeline tests new pull requests made to master. It checks that the proposed changes do not introduce any regressions that prevent CirrOS from building on the supported architectures. The main difference between this pipeline and the build pipeline is that this pipeline only produces the build log as an artifact, not any resulting images.

Next steps

Right now I am sticking with using bionic as the build environment even though pull request #68 updates the build instructions to Focal for local environments. Bionic is currently being supported until 2023, so this gives us time to land other critical updates.

hrw commented 2 years ago

Consider adding cache for ccache and downloads:

    - name: cache downloads
      uses: actions/cache@v3
      with:
        key: cache-downloads
        path: download/

    - name: cache ccache
      uses: actions/cache@v3
      with:
        key: cache-ccache
        path: ccache/
NucciTheBoss commented 2 years ago

@hrw I added the caching for both ccache and download to both pipelines. If I read the caching documentation for actions correctly, the way I have it should load in ccache and download before building CirrOS if there is a cache hit.

Let me know if there is anything else we should include in the pipeline before merging.

hrw commented 2 years ago

Please check how cache is done - from what I saw it is done once and then reused without generating new one (or I did not found a way to force it).

So you would need cache for each ARCH/ccache and for downloads/ one should be enough.

NucciTheBoss commented 2 years ago

Hmm. I'll look into this further tomorrow morning, but upon further scrutiny, I'm not sure the cache action is going to do the trick with how CirrOS builds currently.

Reason saying is that looking at the documentation for caching further, it looks like they keep the cache fresh by pinning it to a certain file in the repository (package-lock.json, Cargo.lock, etc.) and using its hash in the key. Then, when that file is updated and the hash changes, the action has a cache miss and then rebuilds a new cache. The problem is that all these files that the cache is pinned to already exist in the repository before building. I do not think that we can do this with the ccache and download directories since they are created at build time; I cannot hash them before the build step because they do not exist yet within the workspace on the runner.

It also seems like a new cache will only be created when there is a cache miss; it will not be updated each subsequent run if there is a modification. Therefore, to get the cache working, I think we would need some sort of flat file that already exists in the CirrOS repository that we can use to effectively track the downloads by CirrOS and/or ccache for each of the architectures. We could then use this file(s) to pin the caches so a new cache is created when we add a new download or modify ccache.

As for using only one download folder, what I could do is add a preparatory job that runs before the parallel build jobs for each of the architectures. This preparatory job could check if there is a downloads cache. If it hits, it can pass the download directory as an artifact to the parallel build jobs, and then the each respective build job can check for their own ccache. But I need something I can pin the caches too.

That being said, it feels like we need some more advanced tooling to get the caching to behave the way we want with GitHub Actions. What might be best for the short-term is to merge the pipelines as is (without the caching), and fix #77 and maybe #75 since it is causing regressions in Tempest and Octavia. After fixing the rsa2 issue, we can then prioritize adding in the caching tooling for GitHub Actions. Let me know what your thoughts on this are and if maybe you have a different approach we could try for ensuring that the cache stays fresh. Luckily, each of the builds only takes about 20-35 minutes currently.

hrw commented 2 years ago

actions/cache@v3 adds two steps into build:

If there is no cache then nothing gets unpacked. When build is done then cache is created:

/usr/bin/tar --posix --use-compress-program zstd -T0 -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/fork-cirros/fork-cirros --files-from manifest.txt

Note: that requires space in runner so some kind of "make clean" would be needed before.

You can have "hashless" cache but adding some kind of such is handy to have a way to force cache recreation.

25-30 minutes per build may looks fine. But that's 25-30 per build and you make several of them at once. Check what kind of time limits Github Actions provide. We had caching on Travis CI to make sure that we use as less time as possible.

NucciTheBoss commented 2 years ago

Yes, I saw in your fork that the SSD runs out of space when you try to create the cache. Not sure if you have the exact numbers but it seems like the GitHub-hosted bionic and focal runners are pretty bare bones aside from having a relatively decent amount of RAM:

With the way I have it, CirrOS building in parallel, it looks like we can potentially skirt around the storage limit because each job is given its own 14 GB SSD. If you build CirrOS for each architecture in the same job, it looks like you will run out of space trying to do any larger steps after all the images have been built. However, looking at the docs for the cache action, I believe a new cache is only created at the end of the job if there is a cache miss. That's why I think we need some sort of requirements and/or manifest file to pin the cache(s) to. This will allow us to force a cache miss which will update the downloads folder

Regarding the run time, luckily, since CirrOS is an open-source, public project, GitHub is very permissive with what we do with the actions. Each job is allowed to run for maximum of 6 hours each, and an entire workflow can run for up to 35 days. We can also have 20 jobs running concurrently at a time. This where I am getting the information for the limits:

Given this information, that's why I feel confident merging this pipeline for the short-term to at least have something to test a fix for #77 (it will break Tempest functional tests for any distro that uses rsa2 as the default), and then get the caching/bumping the CirrOS build instructions to the focal series working afterwards. The only thing I will need to do before the merge is drop the current cache mechanism so that it does not produce any unexpected behavior for new pull requests while we work on the caching.

smoser commented 2 years ago

Hi, i just wanted to say thank you for looking at this. I will try to review in the next few days. thank you.

hrw commented 2 years ago

Check also work done by @osfrickler (Dr. Jens Harbott) on migrating from Github to Opendev.

Zuul jobs doing builds, update to newer buildroot, Ubuntu 22.04 kernels etc.

https://review.opendev.org/q/project:cirros/cirros

Maybe this is better way.

NucciTheBoss commented 2 years ago

Hi, i just wanted to say thank you for looking at this. I will try to review in the next few days. thank you.

Hi @smoser! Thank you for looking at this pull request! I also sent you a ping over Libera a few days ago (from #ubuntu-devel), but I'm not sure if you saw that. Let me know what you think of the pipeline. Unfortunately, I had to drop the caching feature due to limitations with the GitHub caching action found earlier. The action seems more geared towards things like npm or cargo given that the cache key uses file hashes to keep the cache fresh.

I'm hoping that once this is merged, we can push a fix for the RSA2 compatibility issue. I have used OpenDev for other OpenStack-specific projects, but I cannot seem to find anywhere on Gitea where we can pull the build artifacts and/or publish releases/daily builds that other folks can easily pull. Maybe a conversation for the future, or at least backport the new commits from OpenDev to here.

smoser commented 2 years ago

Hi, i just wanted to say thank you for looking at this. I will try to review in the next few days. thank you.

Hi @smoser! Thank you for looking at this pull request! I also sent you a ping over Libera a few days ago (from #ubuntu-devel), but I'm not sure if you saw that.

I just did now. sorry, I'm not using irc daily now for work, so checking is less frequent.

You should join #cirros on libera. @hrw and @osfrickler are there also.

smoser commented 2 years ago

I commented in #cirros that I'm fine to delegate the decision to @hrw and @osfrickler as they're more present than I am here.

What I'd like to have is:

  1. a branch for 0.5 that builds on $NEW_CI_PLATFORM
  2. move towards 0.6.0 release with all the things @osfrickler has taken care to add
  3. source available on github
  4. doc on how to contribute in README.md
  5. releases still published to github (that is free fast network with cdn)
NucciTheBoss commented 2 years ago

Thanks for responding to me :smiley:. I'll hop into #cirros then to see how we can adapt this pull request to fit the project's needs.

hrw commented 2 years ago

So when it comes to cache it looks like ccache-per-arch + downloads-per-arch hashed by bin/build-release should solve caching problem.

There are differences in what is downloaded on each architecture.

hrw commented 2 years ago

Here is something I am testing now in https://github.com/hrw/fork-cirros/pull/2

name: CirrOS image builder

on:
  push:
    branches:
      - master
  pull_request:
    branches:
      - master

jobs:
  build_matrix:
    strategy:
      matrix:
        arch: ["aarch64", "arm", "ppc64le", "x86_64"]
    runs-on: ubuntu-20.04

    env:
      ARCHES: "${{ matrix.arch }}"
      BOOTTEST: "true"
      QUIET: 1

    steps:
      - name: Pull cirros source artifacts
        uses: actions/checkout@v3

      - name: cache downloads
        uses: actions/cache@v3
        with:
          key: "downloads-${{ matrix.arch }}-${{ hashFiles('bin/build-release') }}"
          path: download/

      - name: cache ccache
        uses: actions/cache@v3
        with:
          key: "ccache-${{ matrix.arch }}-${{ hashFiles('bin/build-release') }}"
          path: ccache/

      - name: prepare system
        run: bin/system-setup

      - name: install job dependencies
        run: sudo apt install cloud-utils qemu-system openbios-ppc

      - name: Build image
        run: bin/build-release daily

      - name: Upload cirros-aarch64 build artifacts
        uses: actions/upload-artifact@v3
        with:
          name: "image-${{ matrix.arch }}"
          path: release/
NucciTheBoss commented 2 years ago

@hrw this looks good, but just a quick question. Noticed that you dropped i386, powerpc, and ppc64. smoser had mentioned something about dropping those architectures over IRC, but I missed that conversation in #cirros. Is it official that we want to drop those arches due to the Ubuntu kernel we borrow not supporting them?

hrw commented 2 years ago

In 0.5 we keep them.

And then drop in 0.6 one.

NucciTheBoss commented 2 years ago

In 0.5 we keep them.

And then drop in 0.6 one.

Sounds good, I will hang onto them for the time being then. I also added a comment so anyone looking at the workflow will know for later. Also, are we going to hold this pull request until #81 has been merged?