Open katiewasnothere opened 3 years ago
@thaJeztah and @kzys
I'm not really sure what makes sense to do here. I don't think it makes sense to implement something like this in moby/sys/signal but to implement it in containerd would mean that we'd need to hardcode signal values (since we couldn't use build constraints to use unix or windows official golang packages), which defeats the point of previous work https://github.com/containerd/containerd/pull/5693 @kzys.
The current signal parsing logic is designed to be compatible with Docker. Especially SIGRTMIN handling, it is compatible with neither the host or the container, if they use musl libc.
https://github.com/containerd/containerd/pull/5693#discussion_r665524873
However I think this should be a static value anyway since containerd isn't doing anything except forwarding it along. Having this value change depending on what the host userspace is linked against doesn't really make sense since it is the container process that has to process the signal.
It should probably also just be a copy of what Docker does since this is going to be the expectation from build->deploy
Regarding LCOW, how much does containerd know the fact that LCOW run Linux VMs? Should it be handled by the shim or an OCI runtime (runhcs)?
Afaik, it doesn't know that LCOW runs using a VM. However, it does know what platform the pod is running, which was part of my previous solution to parse based on the container's platform (see my closed linked PR above). Moving signal parsing to the shim or runtime would break with the current agreement between containerd and shims though, right?
(side note: LCOW containers using containerd use runc as the runtime now 🙂)
This one is tricky indeed. I know we used the string representation in Docker, so that a user can pass --signal=SIGSOMETHING
from the CLI (which could be running Windows, Mac, or Linux), and have the runtime do the translation.
LCOW adds another "mismatch" in that respect, because the host platform doesn't match the container's platform. I know we had some discussions about that with LCOW in the Docker Engine (i.e. does the VM represent the container, or the process inside the VM?).
I (personally) like having these translations happen as close to the actual container as possible, to keep some abstraction (I hope I'm able to phrase this in a meaningful way); basically (where possible) do not require "higher level" runtimes (containerd, docker engine) to be aware of platform specifics, and delegate that to (e.g.) runc (as it already will have to deal directly with the kernel). "higher level" runtimes can then generate a (somewhat) platform agnostic spec and assume that the right thing happens. Looks like the runtime spec also includes kill <signal>
, so from that perspectice it should be possible, but the runc
code is using unix.SignalNum()
, so I doubt that will support the SIGRTMIN
signals; https://github.com/opencontainers/runc/blob/5fb9b2a006725d468381de8d300bb9baecc726f7/kill.go#L65
func parseSignal(rawSignal string) (unix.Signal, error) {
s, err := strconv.Atoi(rawSignal)
if err == nil {
return unix.Signal(s), nil
}
sig := strings.ToUpper(rawSignal)
if !strings.HasPrefix(sig, "SIG") {
sig = "SIG" + sig
}
signal := unix.SignalNum(sig)
if signal == 0 {
return -1, fmt.Errorf("unknown signal %q", rawSignal)
}
return signal, nil
}
I must admit I'm not sure where these signals are used directly (in containerd), for example the Process.Kill()
interface looks to define the signal as a syscall.Signal
(which would complicate this if the container is a different platform as the host); https://github.com/containerd/containerd/blob/a0ae24b98494c2d855e7c8c5b899f363a3d2b181/process.go#L42
I agree, that seems like a reasonable mindset @thaJeztah.
In particular, CRI attempts to parse signals before calling Process.Kill
, see here. Snippet below:
stopSignal := "SIGTERM"
if container.StopSignal != "" {
stopSignal = container.StopSignal
} else {
// The image may have been deleted, and the `StopSignal` field is
// just introduced to handle that.
// However, for containers created before the `StopSignal` field is
// introduced, still try to get the stop signal from the image config.
// If the image has been deleted, logging an error and using the
// default SIGTERM is still better than returning error and leaving
// the container unstoppable. (See issue #990)
// TODO(random-liu): Remove this logic when containerd 1.2 is deprecated.
image, err := c.imageStore.Get(container.ImageRef)
if err != nil {
if err != store.ErrNotExist {
return errors.Wrapf(err, "failed to get image %q", container.ImageRef)
}
log.G(ctx).Warningf("Image %q not found, stop container with signal %q", container.ImageRef, stopSignal)
} else {
if image.ImageSpec.Config.StopSignal != "" {
stopSignal = image.ImageSpec.Config.StopSignal
}
}
}
sig, err := signal.ParseSignal(stopSignal)
if err != nil {
return errors.Wrapf(err, "failed to parse stop signal %q", stopSignal)
}
Which I'm concerned about since even if we don't parse the signal to syscall.Signal
using ParseSignal
we'd still need to potentially translate the signal into a uint value so we can pass it through task.kill
As for runc, we could update kill
to use the moby/sys/signal
s package. For LCOW, runc runs in the hosting VM, so if we can get to this point for signal parsing (iow if we can get passed cri), then that would work for us.
In particular, CRI attempts to parse signals before calling Process.Kill,
Yes, I'm wondering if it should be considered to make Process.Kill()
accept a string instead of syscall.Signal
(or drop the argument altogether and pass it as KillOpt
)
Yes, I'm wondering if it should be considered to make
Process.Kill()
accept a string instead ofsyscall.Signal
(or drop the argument altogether and pass it asKillOpt
)
Is that something that the community would be okay with?
Since syscall package is tied to the host platform, migrating to string or int makes sense for supporting cross-platform containers such as LCOW. But this may need to wait containerd 1.6.x or containerd 2.0.x.
BTW StopSignal is a string in OCI Image Spec.
Regarding LCOW, is the below diagram accurate? If so, would the shim responsible for Windows -> Linux signal conversion?
Kubelet <--> CRI <--> containerd <--> LCOW shim <--> Linux VM --> runc
migrating to string or int makes sense for supporting cross-platform containers such as LCOW. But this may need to wait containerd 1.6.x or containerd 2.0.x.
That's fine. We use a fork of containerd today so as long as the change is in upstream main we can just cherry pick it as needed until it's in a release.
Regarding LCOW, is the below diagram accurate? If so, would the shim responsible for Windows -> Linux signal conversion? Kubelet <--> CRI <--> containerd <--> LCOW shim <--> Linux VM --> runc
@kzys Pretty much correct except that we use the same shim for LCOW and WCOW. Are you asking if today the shim is responsible for that conversion or suggesting to do that? The shim still runs on the host in the LCOW case but runc runs inside the linux VM.
Would it be acceptable to add a new field on the task API's KillRequest
message named signal_name
that takes a string representing the signal instead of trying to make a breaking change to the API?
Since syscall package is tied to the host platform, migrating to string or int makes sense for supporting cross-platform containers such as LCOW. But this may need to wait containerd 1.6.x or containerd 2.0.x. ..
Would it be acceptable to add a new field on the task API's
KillRequest
message namedsignal_name
that takes a string representing the signal instead of trying to make a breaking change to the API?
Arf. I wanted to reply to this.
Yes, I was thinking along those lines as well; I think with that approach it should be possible to add this in a backward compatible way;
Task.Kill()
would keep the same signature (keeping the syscall.Signal
argument (for now at least)KillRequest
would get an extra field for the string representation (which can be set through a KillOpt
)KillRequest.Signal
and Killrequest.SignalName
are set, then the latter should be used (ignoring the numeric signal)KillRequest.Signal
in favour of the (portable) SignalName
@kzys @dmcgowan @cpuguy83 WDYT?
I'm going to go ahead and start working on that implementation then unless there are any objections from anyone
Thanks! Works for me :+1:
I see I didn't mention it in my reponses, but for completeness / consideration:
I think SignalName
should accept both "names" and "numbers", similar to how github.com/moby/sys/signal.ParseSignal
handles them. This allows for well-known named signals to be specified, but (for flexibility) also accepts a numeric value (passed as string) for situations where some esoteric signal is needed. This might also help the migration to the new field (where the passed syscall.Signal
can be passed both as syscall.Signal
, and as a string containing its value, without the need to convert it back to a corresponding name).
If the above is implemented, perhaps we need a different name (KillSignal
? StopSignal
? naming is hard!).
Let me summarize my thoughts. Here is what we have on LCOW
containerd <--(1)--> runhcs-shim <--(2)--> Linux VM <--(3)--> runc
Option 1: Pass string signals from 1 to 3
Pros
unix.SignalNum
. We don't have to have multiple different strings -> integers signal conversion logic in the container world, even the governance parties are different.Cons
Option 2: Pass string signals in 1. runhcs-shim converts that to integers since it knows the underlying platform while containers are Linux
Pros
Cons
runc is a reference implementation and there are other implementations which are written by other languages (crun in C, youki in Rust). They have to know moby/sys's signal parsing logic.
I think with the proposed change, they'd still be able to pass the signal as it is today (numeric value) and things continue to work.
runc and other implementations which runs on Linux don't have to deal with the cross-platform signal handling.
Afaics, runc is already dealing with conversion of signal names to numbers (it gets the signal passed as a string (command line arguments are always strings), and accepts both names and numbers. The only difference would be that using "moby/signal" extends the list of known signals to add the SIGRTMIN signals (alternatively, we should consider having those added to golang.org/x/sys)
The only difference would be that using "moby/signal" extends the list of known signals to add the SIGRTMIN signals (alternatively, we should consider having those added to golang.org/x/sys)
moby/signal is slightly "opinionated" than runc or golang.org/x/sys regarding SIGRTMIN handling. The actual value is depending on libc (musl and glibc have the different numeric value). Pushing that to golang.org/x/sys may be hard.
Because of that, I'm bit hesitant to have the logic in runc. runc is the reference implementation and other implementations may need to do what runc does to be compatible with, even the conversation logic is not covered by the OCI Runtime Spec. As a result, we may make the container world somewhat incompatible with musl-based Linux distros such as Alpine.
While I don't think it would cause a huge production pain, I'm unsure that this is the right solution.
That's the thing though. This is translating syscall numbers that the container will be handling. We are just sending them. The string->syscall number conversion needs to be standardized and cannot depend on go vs musl vs glibc vs whatever.
The container author knows best here, we just need to send a consistent value.
Thanks. I've updated the pros/cons above to add the following;
containerd doesn't have to deal with the conversion logic anymore. This would help other CRI implementations such as CRI-O.
Is this correct? It is bit different from what @thaJeztah said though.
I think with the proposed change, they'd still be able to pass the signal as it is today (numeric value) and things continue to work.
We are going to clarify the conversion logic in the OCI Runtime Spec and other OCI runtimes would follow the change. The only difference is about SIGRTMIN handling and most containers wouldn't be affected though.
- We can standardize the behavior through OCI. OCI Image Spec already mentions SIGRTMIN signals in https://github.com/opencontainers/image-spec/blob/5ad6f50d628370b78a4a8df8a041ae89a6f0dedc/config.md#user-content-properties.
I'm not sure I understand what you mean by this. How would option two allow us to standardize behavior through OCI?
We are going to clarify the conversion logic in the OCI Runtime Spec and other OCI runtimes would follow the change.
Clarify in what way?
Sorry, I put the statement in the wrong section. Option 1 allows/enforces us to standardize the behavior through OCI.
Right now, OCI Runtime Spec describes the behavior like below;
This operation MUST generate an error if it is not provided the container ID. Attempting to send a signal to a container that is neither created nor running MUST have no effect on the container and MUST generate an error. This operation MUST send the specified signal to the container process.
But Linux manpages describes real-time signals like below;
https://www.man7.org/linux/man-pages/man7/signal.7.html
The Linux kernel supports a range of 33 different real-time
signals, numbered 32 to 64. However, the glibc POSIX threads
implementation internally uses two (for NPTL) or three (for
LinuxThreads) real-time signals (see pthreads(7)), and adjusts
the value of SIGRTMIN suitably (to 34 or 35). Because the range
of available real-time signals varies according to the glibc
threading implementation (and this variation can occur at run
time according to the available kernel and glibc), and indeed the
range of real-time signals varies across UNIX systems, programs
should never refer to real-time signals using hard-coded numbers,
but instead should always refer to real-time signals using the
notation SIGRTMIN+n, and include suitable (run-time) checks that
SIGRTMIN+n does not exceed SIGRTMAX.
Using moby/sys in runc means that we are going to do what the manpage doesn't recommend.
programs should never refer to real-time signals using hard-coded numbers.
So I think this should be documented and standardized for other OCI runtimes. It clarifies the handling of real-time signals in the spec.
Don't we already pass SIGRTMIN+n signals as integers to CRI?
Don't we already pass SIGRTMIN+n signals as integers to CRI?
I'm unsure.
containerd's CRI plugin is calling ParseSignal. 4 files (including one test) in containerd are calling the function so far.
https://github.com/containerd/containerd/search?q=ParseSignal
I think, we need to remove these if runc is responsible for the conversion.
programs should never refer to real-time signals using hard-coded numbers.
The only issue is this recommendation was made by glibc but doesn't handle the case where a glibc process sends a signal to a musl process. So if we wanted to handle this perfectly in runc we would need to try to detect which libc a program is using, which is going to be as close to impossible as you can get when you consider static linking as well as symbol stripping (maybe we could check whether it has @GLIBC_
symbol version names?) and I'm not in the mood to start doing symbol dumps and analysing them from within runc just to figure out whether to use signal 37 or 38. What a mess...
I wonder how the LXC folks handle this... @brauner How do you handle the fact that SIGRTMIN is different based on the libc used by the process? Or is this something we've all collectively ignored because it's such a silly problem that nobody (myself included) thought to check whether this is an incompatibility that exists.
LXC uses real time signals only for systemd, i.e. when the container runs systemd and no specific signal has been specified we will send the relevant real-time signal to systemd. Since systemd doesn't build with musl we don't have issues with libc differences.
We do signal forwarding in e.g. LXD but it is restricted to well-defined signals such as SIGWINCH, SIGUSR{1,2}, SIGINT etc. not real-time signals.
LXC uses real time signals only for systemd, i.e. when the container runs systemd and no specific signal has been specified we will send the relevant real-time signal to systemd. Since systemd doesn't build with musl we don't have issues with libc differences.
Because systemd's shutdown sequence is based on real-time signals I should add.
Also note that SIGRT{MIN,MAX}
aren't constants but function calls.
The problem exists there too though in so far as it is possible that glibc inside and outside of a container could technically alter SIGRTMIN and differ. So one thing I talked to Lennart about was expressing it in terms of SIGRTMAX-
What is the problem you're trying to solve Containers run that rely on containerd and containerd's packages may be running a platform different than the host. For example, linux containers on windows, which run using a lightweight linux virtual machine. When sending signals to a container, we should use signals for the container's platform, not the host's.
Describe the solution you'd like We parse signals based on the container's platform.
Additional context I previously made a PR to accomplish this here, but it fell off and eventually was closed. Recently, this PR was made to remove signal parsing in containerd in favor of using moby/sys platform parsing in here.