coreos / coreos-assembler

Tooling container to assemble CoreOS-like systems
https://coreos.github.io/coreos-assembler/
Apache License 2.0
334 stars 165 forks source link

Future of Power CI under P10/PowerVM #2473

Open ravanelli opened 2 years ago

ravanelli commented 2 years ago

I'm creating this issue for us to have a common place to discuss the next steps for Power CI. So, we can get more insights from multiarch folks around, and decide the best way to more forward.

With gangplank we are improving our CI to create a more multi-arch world for FCOS/RHCOS/Cosa, and also to resolve eliminate some issues as duplicated CIs around. The arm64 was successfully added, and now we are looking for Power and s390x to be part of this beautiful world.

Unfortunately, there are some strugglers with Power looking for the future. As we know P10 dropped baremetal support (PowerVM only) as RHEL9 also dropped support for kvm on Power.

Our entire ci is based on qemu/kvm. It will be really hard to change it to accommodate only Power.

Recently, I was trying to enable gangplank remote in Power, using a server provided for IBM in IBM cloud. Nonetheless, this server is a P9 using PowerVM, and here is where we can start to feel the issues working with PowerVM/kvm.

I reached to folks in IBM to understand better the options we here, and the feedback I got so far is:

Looking for these scenarios looks we are not really able to run kvm under a PowerVm.

More details: https://bugzilla.redhat.com/show_bug.cgi?id=2008271

dustymabe commented 2 years ago

Thank you for writing this up @ravanelli.

Looking for these scenarios looks we are not really able to run kvm under a PowerVm.

Ouch.. That really breaks our existing model and will force us to carry quite the delta just to add that architecture.

mkumatag commented 2 years ago

cc @clnperez @manojnkumar

laggarcia commented 2 years ago

Here it is a summary of the discussion we had with Renata on this topic. If I got something wrong, please, let me know, as I am not knowledgeable on COSA/FCOS/RHCOS.

The CI infrastructure controller you have today run on an x86 environment. At some point in the process, this controller will contact a Power server to actually build the Power images and run basic build verification tests on them. There are two requirements on the Power server so that it can seamlessly integrate with your infrastructure:

In order to fulfill these requirements, you will have to run your build process on a POWER9 bare metal machine. You will need to find one that is available with a public IP address. Given that is available, you should have no issues in running the build process on that machine and spawning VMs with the built image to do your basic verification of the build process.

Availability of a Power10 system with KVM support should not be an impediment here. The build process usually targets old processor versions because of compatibility and support reasons. Just as an example, IIRC, RHEL 8 is built targeting POWER8 processors as it needs to run on both POWER8 and Power9 processors. So, for the foreseeable future, using a Power9 bare metal machine to build the FCOS image and test the build process with KVM should be enough. This environment will be supported for many years to come yet.

Please, let me know in case you have any additional questions on this.

ravanelli commented 2 years ago

Thanks @laggarcia for all the discussion related to this topic.

Right now, we don't have any bare metal Power server around with public ip access, to allow us to continue with the FCOS improvements for Power. Unless we can find it, there is no other option but to wait.

jcajka commented 2 years ago

@laggarcia my understanding has been that FCOS CI/pipeline requires openstack/aws/ocp(nowadays it should be just the first two) like cloud infra and is not really able to work with stable VMs/hosts. @dustymabe please correct me if I'm wrong. @ravanelli We should have around kvm based power9 VMs that can be provided(if there is no issue with them being outside of the Fedora infra), hosted at Brno University of Technology. Possibly even one whole bare metal p8 box. AFAIK nested kvm should work there.

ravanelli commented 2 years ago

@jcajka How reliable is the support for the Brno University? I tried to use the minicloud in Unicamp, but lack of support is really an issue there. I had to wait more than a month to get a firmware update.

clnperez commented 2 years ago

You can also get an openstack environment from OSU: https://osuosl.org/services/powerdev/request_hosting/. I've only ever requested standalone VMs, but have had very good stability and support from them. Not suggesting over Brno, but if we need another option that's one to consider. I believe this project falls under the "Free and Open Source" restriction.

dustymabe commented 2 years ago

@laggarcia my understanding has been that FCOS CI/pipeline requires openstack/aws/ocp(nowadays it should be just the first two) like cloud infra and is not really able to work with stable VMs/hosts. @dustymabe please correct me if I'm wrong.

We can work with a single bare metal machine and talk to it over SSH. That's what we're doing currently for aarch64

jcajka commented 2 years ago

@dustymabe cool, good to know. I still assumed that it is in aws was essential for various reasons, mostly redeployment, etc. @ravanelli what are your expectations, requirements? Most of issues, if there are solutions(new FW) available from the HW vendor, I can probably resolve under a week(I'm one of the admins there). But formally it is not commercial offering, so best effort.

clnperez commented 2 years ago

Can we pick this conversation back up? We're getting a couple of new ping from customers about OKD.