Open bgilbert opened 6 years ago
Related question by @roffe at https://github.com/coreos/torcx/issues/94: will the torcx
userland binary be part of the OS? Right now we just provide it as a docker image, which may pose a chicken-egg problem.
I expect we'll ship the userspace binary in the OS, yes.
The manifest should be signed with an inline signature, not detached. This makes the parsing more complex, but fixes signature validation failures in the presence of mirrors/caches/CDNs which may not sync both files at the same time.
Is this project still on track? :) According to the CoreOS Blog Torcx is planned to be fully implemented at May 23, 2018.
@Iodun This is on track at this point.
A quick status update on this, as we have reached the dates that we were initially targeting. The original proposal here has been ironed out with the additional details, but the implementation is still in progress.
The current status can be tracked as
According to the CoreOS Blog the flip to Docker 18.x and removal of the Docker 1.12 workaround is planned for July 18th.
As far as I can tell Kubernetes v1.10 is still not validated on Docker 18.x. We'd prefer to run against a validated version.
The removal of the 1.12 workaround without another mitigation forces us to upgrade a large fleet of clusters to 1.8+. Or turn off OS upgrades in general. 🙁
Is that still going to happen?
@BugRoger it is going to happen, but the initial timeline has been skewed as we are still reviewing/merging some of the components involved. This has also been discussed on the coreos-user ML. We are keeping the references in the comment above updated as we go, and we'll publish a timeline update once the groundwork is merged.
Problem
/etc/coreos/docker-1.12
supports exactly two versions of Docker: 1.12 and “current” (the most recent Docker CE Stable at the time a Container Linux version was branched for alpha). This is accomplished by shipping two Docker torcx images in Container Linux, increasing the (compressed) size of the OS and update payload. This mechanism does not support other versions of Docker (such as 17.03, which has been validated for use with Kubernetes 1.8), and does not support torcx images not shipped in the OS./etc/coreos/docker-1.12
is intended to be a temporary interface to the torcx infrastructure, and will be dropped from the Container Linux alpha channel on June 6, 2018. Meanwhile, we need to build out the long-term torcx UX.Terminology
Context
Container Linux is a compact, unified OS focused on providing the core infrastructure needed to run containers. Most additional software needed on a Container Linux system can, and should, run in a container. torcx is not intended to change that model.
In some environments, software components must be added to a Container Linux system which cannot run in a container, or an alternate version of such a component must be used. An obvious example is the container runtime itself. torcx is intended to provide a compact mechanism for managing such addons. This is expected to be a relatively advanced feature of Container Linux, and most users should never need to interact with torcx.
This bug is a proposal for the interface that torcx will present to users. Nothing here is finalized, and discussion is encouraged. The proposal does not specify the addons or addon versions which will be provided by CoreOS, and we expect that CoreOS will supply only a very small number of torcx images in addition to those shipped with Container Linux itself.
Requirements
Proposal
Remotes
Add a concept called a “remote” and a corresponding JSON schema. A remote is a network image repository, represented by a short name (e.g.
coreos
orcom.coreos.cl
) and the following attributes:torcx-manifest.json
. The URL may contain template substitutions for at least$board
and$version
(the Container Linux version).Remotes are defined via individual files in a search path over the usual set of directories (
/usr
,/etc
,/run
). There should be a mechanism for overriding individual attributes of a remote, perhaps via drop-ins, to allow offline systems to use their own mirror for the CoreOS-provided remote. Some users will also configure their own remotes.Question: should local torcx stores also be treated as a remote? Perhaps we could drop the distinction between them.
Manifest
This is essentially the tectonic-torcx manifest. It is downloaded from the network and lists the images available from a remote. The manifest is signed with a detached GPG signature.
Relative to the tectonic-torcx manifest,
sourcePackage
anddefaultVersion
can be dropped, and relative URLs should be permitted if they aren’t already.Question: do
path
declarations make sense here? Remote repositories shouldn’t be able to reference local paths.Profile
Users (or higher-level tooling working on their behalf) can create a custom profile specifying the image references that should be available on the system. These references override the corresponding image references in the vendor profile.
Extend image objects in torcx profiles to optionally specify the name of a remote. This would be handled during profile merging in the same way as other image attributes: the remote, if any, is taken from the last declaration for a given image. If a remote is specified, the image is fetched during the fetch phase.
Fetching
Fetching a profile requires downloading the manifest and signature for any referenced remote, checking the signature, comparing image hashes to any images cached locally, and fetching any missing images. This can’t run as part of the torcx generator, since networking may not be up yet. Fetching should occur in the initramfs on first boot, after Ignition runs; and also from
coreos-postinst
after an update. That duplication is unfortunate, but it allows the system to defer rebooting after an update until all of the pieces are available.Question: this design requires a manual fetch operation if the user changes the local profile. Should the initramfs attempt to detect this case and fetch automatically?
Deprecation
Once an image becomes unsupportable, remove it from the OS, or in the case of a CoreOS-maintained remote, stop adding new OS-version-specific images. A Container Linux system which uses that image will then fail future OS updates in the
coreos-postinst
phase. This approach seems to most directly conform to the user’s stated intent: their workload will continue running, even at the cost of future security updates. All other alternatives directly cause breakage one way or another. In conjunction, Container Linux should do a better job of reporting update failures so they will be less likely to go unnoticed.Tooling
coreos-postinst
will use check to ensure that torcx will be able to run after reboot, so it should be able to validate profiles and images from the newly-installed/usr
./usr
.next-profile
and for storing a custom profile or remote.