containers / youki

A container runtime written in Rust
https://containers.github.io/youki/
Apache License 2.0
5.99k stars 332 forks source link

[RFC] Consider Alternate CI for running Podman tests #2827

Open YJDoc2 opened 1 week ago

YJDoc2 commented 1 week ago

Background

Currently we have a CI for running podman e2e tests using Youki as the runtime. This is intended to serve as a conformance test to check that Youki can work with podman correctly. We only run tests with sudo, so this does not test rootless behavior. Currently Youki are not passing, details are below.

Motivation

The tests which are failing currently can be differentiated in 3 categories :

Thus in order to fix 1st and 3rd tests, I am proposing to run podman tests CI in some other CI provider, such as Cirrus CI or Circle CI. These both provide a free tier for public OSS repo with credits and unlike github, provide VM setup that we have complete control over. Note that podman itself uses Cirrus CI.

I am NOT proposing to move any other CI to these, as that is not needed, and does not make any sense.

Considerations

Some considerations I have done :


If there are no issues with moving the test CI to other provider, I can test both providers on my fork, and we can consider which one to finalize based on that.

cc: @containers/youki-maintainers

yihuaf commented 5 days ago

Tests failing because Youki has different/incorrect impl. For eg, a test is failing because the error message given by youki is not in the format expected by the test. These kind of tests can be fixed, and should be fixed so that youki can be used with podman.

These should be fixes as first priority and should be straightforward to fix/understand. Are issues in this category impacted by the CI provider environment? Based on the description in the issue, only 3rd category requires the changing in CI?

Thus we need ~35*31 = 1085 minutes per month credits.

We can save some more by running Mon - Fri or something similar patterns. I don't think we would loose much coverage if we reduce the nightly test frequency by a little.

Note that podman itself uses Cirrus CI.

Then we should start exploring the options here.

We will need to figure out how to report if the tests fail. Currently we do not report results as we know the tests are failing.

How does podman implement this? Is this something we can follow their lead?

If you can break down the tasks, we can help out on the effort.

utam0k commented 5 days ago

For your information: https://contribute.cncf.io/resources/project-services/hosted-tools/#cicd

YJDoc2 commented 2 days ago

These should be fixes as first priority and should be straightforward to fix/understand. Are issues in this category impacted by the CI provider environment? Based on the description in the issue, only 3rd category requires the changing in CI?

Hey, so the 3rd are the reason I opened this RFC, but fixing the env based failures also allows confirming which tests are failing due to env and which are actual failures. Right now a failing test could be either of them and to decide, one needs to run the failing test in a vagrant VM. We also do not have a way to keep checking that the fix added for the test works in CI. Once we are sure that no tests are failing due to env issues, the rest are either config or actual failures and can be fixed and kept in check via CI.

We can save some more by running Mon - Fri or something similar patterns. I don't think we would loose much coverage if we reduce the nightly test frequency by a little.

Yep! I had not considered this, thanks for pointing this out.

How does podman implement this? Is this something we can follow their lead?

Podman runs the CI on each commit/PR, and the Cirrus CI has github app which reports CI similar to native github CI. As we don't run the podman tests in PR, we need a way to explicitly report these failures. Maybe Cirrus/Circle CI itself has an option to report differently and we can use that.

If you can break down the tasks, we can help out on the effort.

For now, we first need to do a poc with both, with cirrus being preferable as podman itself uses it. Once we have a better idea, we port over the test CI. I feel both of these should be done by a single person. Once that is done, we can have a list of failing test which can be dealt with separately.

@utam0k : For your information: https://contribute.cncf.io/resources/project-services/hosted-tools/#cicd

Hey, I had seen this, but I feel it will take some time for the decision to be finalized on CNCF side, and it'd be better if we start by our own, and then we can port over to their infra.