Closed lsm5 closed 3 years ago
@vrothberg is this a bug or am I doing it wrong here?
Could be a bug. If so, we're not executing it in CI which would bring up the question: do we still need/want it?
AFAICS the tests do include several instances of setupRegistryV2At
which need that binary.
We can probably upgrade this build to a later (latest?) docker/distribution easily enough; the later build of REGISTRY_COMMIT_SCHEMA1
is worse, it’s only only useful if we don’t upgrade much. So exploring how to turn off (enough of) the module support in Go to build the old versions seems preferable.
Could be a bug. If so, we're not executing it in CI which would bring up the question: do we still need/want it?
In CI we use the pre-built container image quay.io/skopeo/ci:${DEST_BRANCH}
(so "master" in this case). That should be building the same container image, so I suspect there should be build failures showing in quay as well...
...indeed they've been failing for the last 13 days.. Notifications in quay are really bad, and are turned off by default, which explains why nobody noticed :disappointed_relieved:
In any case, it all pokes at the issue/recommendation I made a while ago: Why are we forcing testing to run in a --privileged
container at all?
So my preference would be to completely kill the requirement for this container at all levels.
rant It's the job of documentation and the user (or automation) to ensure a compatible build/test environment. Forcing it on users with make + podman is ripe for causing all kinds of problems (clearly including, maintenance headaches). /rant
Okay, I've had the requisite "calming deep-breaths" now. Idea regarding the build problem: Is there some reason we can't simply use quay.io/libpod/registry:2
? We use that image all over the place in containers CI, it's very stable/static. Nobody dares to touch it :grin:
If nothing major changed in the meantime (I’m not up-to-date), we create a single container with both the servers (the v2 registry, the v1-only registry, the awfully-old-OpenShift that still actually can run basically completely inside a single container), and the Skopeo binary to test and the test code. We only rely on networking inside a container, and the test Go code is creating config files for the registry servers and the like.
Hence also the tension between needing a fairly fresh base image (for a fresh Go version to be relevant to test the codebase) and a fairly conservative base image (to keep the old servers running) — we can’t just freeze the infrastructure on a 5-year-old container with all the servers because the test subject wouldn’t build/run in a relevant environment.
So a separate container to run a registry is not immediately beneficial — we would have to move a lost of the test setup code into a multi-container-creation step run… in yet another container? That would definitely have some benefits (we could just never rebuild the old OpenShift again), OTOH it’s also a non-trivial amount of work and more importantly places much larger demands on the test environment; it would not be just a make check
on any developer’s workstation building/running a single container in Podman. Is this practical to do/fully automate using Podman for individual laptops? I’d rather not make access to a K8s cluster a prerequisite for working on Skopeo CI, for example.
Or do you mean we should extract the server binary from quay.io/libpod/registry:2
(assuming it is statically linked) and run it inside the current test container? That could work…
… or maybe we should “just” be using multistage builds, building all the servers as static binaries in old environments, and importing them into the test container. Is that practical to do on individual laptops and the CI?
Yes, @mtrmac you clearly know/understand the low-level nuts/bolts best, thanks for replying. Primarily the container image I was ranting about is the make build-container
one. The other ones used for the system-tests seem to be working okay for now(?).
can’t just freeze the infrastructure on a 5-year-old container with all the servers because the test subject wouldn’t build/run in a relevant environment.
Yep, I understand the tension with that. What I was thinking is more of if we could separate the concerns? Run skopeo (and tests) in a modern environment, but use the old registry container as a stable resource. i.e. decouple the client and server environments.
Is that practical to do on individual laptops and the CI?
Fortunately I have lots of experience (good and bad) in this area. I'm generally not in favor of building containers for laptops or CI "on demand". It's better to offload that work to the registry. However (clearly) the quay auto-builds are also no-longer ideal, and are causing surprises from several perspectives. I've seen this before too.
What seems to work well in a multi-developer + CI environment, is for completely separate automation (from developing and testing) to handle building and pushing images. If you also use tags (instead of "latest"), then this provides for predictable updates in the repository (for developers and CI). i.e. bumping a statically defined tag reference in a PR. Like how we do for the VM images.
There are lots of ways to automate this. Github-actions is an option, but I really dislike working with it (it's poorly designed IMHO). Cirrus-CI has support for running cron-like jobs, so all we need is a build script/Makefile/command. There's also the containers/automation_images
repo. where builds are simply done by opening a PR. I'm happy to help however I can, with any of these or other options not mentioned.
Getting rid of the 20 minutes to build the OpenShift+registries on many CI runs, by rebuilding the servers only if necessary, would certainly be great.
The laptop concern was more of “do we require a specific tool set / networking setup for tests to work?” / “is there a risk that whatever the CI does with networking could break unrelated software on the laptop?”. I suppose one answer would be to just automate this in GitHub and tell everyone to do WIP commits to invoke the tests, and never run them locally, OTOH it is really nice to be able to break into a failing test with a debugger or a shell subprocess from time to time.
Naw, I'm going to disagree. My experience has been that developers prefer it both ways, local and CI. Certainly pulling a pre-built container image is faster and more reliable than building locally.
As for environment setup needed to work and test locally, networking included. I think this can be solved by documentation and deliberately trying to make it as simple as possible.
A friendly reminder that this issue had no activity for 30 days.
Closing as we don't have the build-container
anymore.