containers / toolbox

Tool for interactive command line environments on Linux
https://containertoolbx.org/
Apache License 2.0
2.47k stars 211 forks source link

Add Fedora CoreOS to continuous integration tests #714

Open debarshiray opened 3 years ago

debarshiray commented 3 years ago

We want Fedora CoreOS to be one of the primary supported platforms, just like Fedora Silverblue and Workstation or nowadays RHEL 8. Therefore, it would be good to run our continuous integration tests also on Fedora CoreOS hosts.

This will help avoid regressions like https://github.com/containers/toolbox/pull/656 and https://github.com/containers/toolbox/pull/712

HarryMichal commented 3 years ago

TL;DR, summary at the end

Toolbox uses for its CI Zuul hosted on SoftwareFactory. Just to give a bit of a background, the main reason why we use SF + Zuul is the fact they offer running tests on "native" Fedora hosts, not only containers as in the case of Travis/GitHub Actions/...

Also, worth noting could be the fact that the development of Zuul and SoftwareFactory is in very close contact with works on Fedora CI. If we can not come up with a solution, we could ask folks in the initiative for advice.

I asked a few days ago the folks behind SF and Zuul about the possibility to add support for Fedora CoreOS. The response was that it could be just a matter of finding an image and providing a configuration in SF using that image (https://softwarefactory-project.io/cgit/config/tree/nodepool/virt_images).

To me, this sounds quite feasible. The only obstacle I see (and it may be large) is in the fact that Zuul is built on top Ansible. And I don't know how well Ansible plays with CoreOS. And I don't mean this in the sense of running Ansible inside of CoreOS but in the sense of Ansible operating CoreOS (e.g. installing packages). From what I understand, Zuul/SF execute series of steps before and after running tests in the environment. Considering the fact that Zuul runs 100% of times on "classic" package-based systems, I cannot say that the same steps will work without any problems on CoreOS. But this is currently only a speculation on my part.

Jumping forward, let's say Zuul supports FCOS, how do we build Toolbox & execute the tests?

The easiest solution to me seems to be to use a container. And either we can use a pre-built one with all the dependencies already in place or just create a generic (Fedora?) container, install dependencies and build. Both solutions have their pros and cons but I don't see anything complicated here.

Running the tests will be a bit more "fun" thing to do :). For our system tests we use bats, which is a minimalist testing framework. To get it we can either clone with git or layer a rpm. If we choose to layer then there is also the choice between rebooting (which according to Zuul folks should be totally fine) and applying the changes live with rpm-ostree ex apply-live.

Summary:

miabbott commented 3 years ago

Support for FCOS in Zuul is related to support of FCOS in Ansible

This might be a non-starter for FCOS. We are actively trying to keep python out of FCOS and Ansible has a requirement on python.

See https://github.com/coreos/fedora-coreos-tracker/issues/592 and https://github.com/coreos/fedora-coreos-tracker/issues/578

That being said, it may be possible to layer in python via an Ignition config but there is a natural tension between managing configs on the host via Ansible and wanting to do it declaratively via Ignition.

I skimmed the docs about adding a diskimage and they seem very specific to traditional RHEL/Fedora style images. I'd be interested to hear from a SF/Zuul expert on this topic.

HarryMichal commented 3 years ago

Support for FCOS in Zuul is related to support of FCOS in Ansible

This might be a non-starter for FCOS. We are actively trying to keep python out of FCOS and Ansible has a requirement on python.

See coreos/fedora-coreos-tracker#592 and coreos/fedora-coreos-tracker#578

That being said, it may be possible to layer in python via an Ignition config but there is a natural tension between managing configs on the host via Ansible and wanting to do it declaratively via Ignition.

I'm aware of the effort and respect it. Note the wording of "FCOS in Ansible", not "Ansible in FCOS" :). I also mention it in the longer part of the comment:

And I don't mean this in the sense of running Ansible inside of CoreOS but in the sense of Ansible operating CoreOS (e.g. installing packages). From what I understand, Zuul/SF execute series of steps before and after running tests in the environment. Considering the fact that Zuul runs 100% of times on "classic" package-based systems, I cannot say that the same steps will work without any problems on CoreOS. But this is currently only a speculation on my part.

But the lack of Python could still be a potential problem. But this is better to be discussed with Zuul folks.

I skimmed the docs about adding a diskimage and they seem very specific to traditional RHEL/Fedora style images. I'd be interested to hear from a SF/Zuul expert on this topic.

When I asked the folks about the images, I was asking in the context of adding FCOS and Ubuntu images. I didn't get a feeling from their response that they are against the idea.

Discussion from #softwarefactory on Freenode:

harrymichal Hi folks! I've got a question regarding operating systems available in Zuul in SF. Would it be possible to add to the pool Fedora CoreOS and possibly Ubuntu?
        FCOS probably shouldn't be "much" of a problem but I suppose Ubuntu might not be included because SF only wants Fedora + CentOS ecosystem?

tristanC    harrymichal: hello, you can find the list of image, and how they are built in https://softwarefactory-project.io/cgit/config/tree/nodepool/virt_images
        harrymichal: basically, if there is a cloud qcow available, then we just need to virt-customize it to add the zuul ssh keys and some tools like git or rsync

harrymichal tristanC: So, if I were to provide a cloud qcow for Ubuntu, you wouldn't be against adding it?

tristanC    harrymichal: i think that's ok, what is the use-case though? :-)

harrymichal In the future, we want our tool to be "officially" supported on Ubuntu. The best way to do that is to test. We want to prevent CI duplication and just use Zuul to run our tests.
        tristanC: What powers Zuul? As in the machines. OpenStack?

tristanC    harrymichal: the zuul at softwarefactory-project.io is running on OpenStack instances provided by vexxhost, and the deployment is managed by zuul itself through this project: https://softwarefactory-project.io/cgit/software-factory/sf-infra/tree/README.md
        and the nodepool-builder service, that manage images update does use nested-kvm to enable virt-customize

harrymichal tristanC: Thank you for the answer. I'm asking because Fedora CoreOS has several qcow images separated by different Cloud providers.
        I'm now wondering if using Fedora CoreOS will proceed without any problems. It is "a bit different" than traditional Fedora. Packages are not installed using dnf but layered on top of the base image using rpm-ostree. Hmm... We won't know until we try :).
        I'll try to submit the contribution before the end of the week.

tristanC    harrymichal: Zuul can uses different Cloud providers to run job workload, for example ansible/awx jobs are running aws
        harrymichal: so perhaps we could add a new resources providers for running those new coreos jobs
        harrymichal: when using config/nodepool/virt_images, we could add a new set of role to build the rpm-ostree image too, the images are just ansible playbook that needs to produce a qcow2, it doesn't have to be using virt-customize

harrymichal tristanC: Ah, interesting. Didn't know that about Zuul. Cool!
        tristanC: One more question. Is it possible to restart the system used in a job during the job?

tristanC    harrymichal: yes that should be possible
        zuul doesn't mind if the job goes offline, it only wait for the ansible-playbook command exit code node* goes offline

harrymichal Awesome!
travier commented 3 years ago

We also have the option of building toolbox inside of a podman container on the FCOS VMs before running the tests on the VM itself. The best scenario for us would be to produce Ignition configs that perform the tests and report success directly as running Ansible might become tricky quickly.

miabbott commented 3 years ago

harrymichal: basically, if there is a cloud qcow available, then we just need to virt-customize it to add the zuul ssh keys and some tools like git or rsync

Seems like an early experiment would be to take the FCOS qcow and try using virt-customize to crack it open and drop some binaries on it.

HarryMichal commented 3 years ago

@miabbott, I have no clue how to work with virt-customize. Would you be so kind and took care of this initial testing?

cgwalters commented 3 years ago

Mmm...what about the option of adding Prow and/or CoreOS CI to this repo? We now have added good support for nested virt to Prow.

Also tangentially related to this is https://github.com/coreos/fedora-coreos-config/pull/862#discussion_r585676001

HarryMichal commented 3 years ago

Mmm...what about the option of adding Prow and/or CoreOS CI to this repo? We now have added good support for nested virt to Prow.

Also tangentially related to this is coreos/fedora-coreos-config#862 (comment)

My way of thinking here is to make use of what Toolbox already has to reduce maintenance burden. Does it sound too complicated to add FCOS to the existing CI? If yes, then we can go in the direction of adding Prow/CoreOS CI.

cgwalters commented 3 years ago

Considering the fact that Zuul runs 100% of times on "classic" package-based systems,

I suspect the real first problem is that Zuul's OpenStack focus assumes that the systems under test use cloud-init, not Ignition. (EDIT: To clarify, Ignition and FCOS support OpenStack, but it's common for systems talking to OpenStack to assume the guest uses cloud-init)

and applying the changes live with rpm-ostree ex apply-live.

This should be totally fine, though I'd actually just say to use rpm-ostree usroverlay and rpm -Uvh or even skip RPM entirely and just make install or rsync the binaries over.

Honestly though, I am not super concerned about this side of things - my instinct says that the /boot ro mount thing was unusual and not likely to reoccur. I think by far the biggest win is going to be the opposite direction i.e. gating FCOS on toolbox working.

Because what history says is far more likely to happen is e.g. a podman change breaks toolbox - and FCOS' CI is where we gate everything together before it ships to users. (And once Silverblue rebases on FCOS, we would achieve the important property of not shipping an ostree commit to users unless toobox works)

debarshiray commented 3 years ago

what history says is far more likely to happen is e.g. a podman change breaks toolbox

Yes, I agree. Historically most of the breakages have been Podman regressions. So any progress in that direction is welcome. We (mostly @HarryMichal) once tried to get Toolbox added to Podman's Fedora gating CI, but that ended up getting lost in the weeds.

we would achieve the important property of not shipping an ostree commit to users unless toobox works

Yes, that would be awesome.

I filed this issue because I felt that there were some really frustrated CoreOS users out there who feel that Toolbox is always broken for them. Until a few months back, it was due to the rootful use-case. Now that sudo toolbox works, unfortunately, they got hit with https://github.com/containers/toolbox/pull/656 and https://github.com/containers/toolbox/pull/712

So, as part of treating CoreOS as a primary platform, I was looking for a way to avoid such things in the future. But ultimately it's up to you. :) If you are happy to only gate CoreOS images on Toolbox, then that's definitely fine by me. If you want to do something else, or do multiple things, then that's also fine with me.

I'll take anything that reduces the number of user-facing breakages as a win.

travier commented 3 years ago

We now have tests in Fedora CoreOS CI but of course that does not covers changes here so this is still relevant. I'll have to take a look at the Zuul setup.

HarryMichal commented 3 years ago

@travier We can take a look at it together if you want. Just let me know.

travier commented 3 years ago

Current plan based on discussion with Zuul/SF maintainers/members:

Then we can create a new playbook to:

travier commented 3 years ago

See also discussion in https://github.com/containers/podman/issues/10296

debarshiray commented 2 years ago

See also discussion in containers/podman#10296

It got done. The Toolbox test suite is now run as part of Podman's downstream Fedora gating.

debarshiray commented 2 years ago

Any updates on getting a Fedora CoreOS host added to the CI?

travier commented 2 years ago

Sorry, I have not been able to get to this and other issues are keeping me busy right now. 😕

HarryMichal commented 2 years ago

Sorry, I have not been able to get to this and other issues are keeping me busy right now. confused

Also dropped the ball on this.

sumantro93 commented 5 months ago

I can help with this. Can someone please tell me what has been done until now? I can maybe drive this home