devcontainers / spec

Development Containers: Use a container as a full-featured development environment.
https://containers.dev
Creative Commons Attribution 4.0 International
3.38k stars 208 forks source link

profile.d and entrypoint.d #19

Open Chuxel opened 2 years ago

Chuxel commented 2 years ago

A persistent challenge in creating containers has been how to simply and easily add environment variables or source scripts as a part of various user shells (bash, zsh) and firing things on container startup - particularly if the thing that is starting requires root access.

Today we've dealt with this by:

  1. Having all of our scripts add content to /etc/bash.bashrc and /etc/zsh/zshrc
  2. Defaulting userEnvProbe to loginInteractiveShell to ensure any tool gets these variables
  3. Recommending containers run as root, but connect tools as non-root
  4. Creating entrypoint scripts that fall back on sudo if not running as root
  5. Providing an "entrypoint" property for the upcoming features capability (#8)

There are several challenges here.

  1. If a pure login shell is used via docker exec or userEnvProbe is updated, (1) will not work. Scripts can also be added to /etc/profile.d and /etc/zsh/zprofile (or zshenv) to solve this, but you then have to check to verify whether the script has already fired if you are doing something that shouldn't happen more than once or has overhead.
  2. For (3) and (4), sudo is not always present, and in some cases may be explicitly omitted for security reasons - particularly password-less. Running the container as root is also not a good idea on shared Linux hosts unless a VM is used to house the containers - like in Docker Desktop (which is also coming to Linux). There is "rootless" mode for container engines, but these also have limitations and end up causing problems for bind mounts since all files end up being owned by root. (Using a container volume does not have this limitation, however).
  3. For (5), currently the feature needs to be reference even if it is built into the container image. This can be quite confusing and easy to forget. In addition, adding an explicit entrypoint to a Dockerfile also requires overrideCommand to be set to false which has similar challenges - though the label metadata proposal could help here (#18).

Proposal

Rather than all of this, we can have a well defined pattern when features are used. There could even be a simple feature that does nothing more than set up execution of this pattern. Instead of everything described above, we can introduce two well known folder locations:

  1. A profile.d folder (or a similar name). If a feature places a script in this location, it will be sourced by /etc/bash.bashrc, /etc/zsh/zshrc, /etc/profile.d, and /etc/zsh/zprofile. A single bootstrapper can then be sourced to ensure these scripts are executed with code similar to what you'd find in /etc/profile. However, it would check to see if the bootstrapper has already run to ensure it doesn't fire more than once like in the case of a login interactive shell.

    if [ -z "${DEV_CONTAINER_PROFILE_BOOSTRAP_DONE}" ] && [ -d "/usr/local/etc/dev-containers/profile.d" ]; then
        for script in /usr/local/etc/dev-containers/profile.d/*; do
            if [ -r "$script" ]; then
                . $script
            fi
            unset script
        done
        export DEV_CONTAINER_PROFILE_BOOSTRAP_DONE="true"
    fi
  2. An entrypoint.d folder where you can place a scripts that will automatically be executed as a part of the container startup. This eliminates the need for an explicit entrypoint property in devcontainer-features.json. However, execution is a bit different than profile.d. It should:

    1. Be explicitly added to the container's entrypoint if not present already.
    2. Execute the scripts in alphabetical order so that, like /etc/profile.d, you can add numeric prefixes to indicate when it should happen.
    3. The processor should then fire exec for any arguments passed into it so that the PID for the processor is replaced by the existing entrypoint commands.

Features then follow a pattern of placing contents in these folders rather than attempting to manually update configuration.

Alternatives

Part of the reason for entrypoint.d is that most containers do not have systemd or another init system that supports process startup. (The tiny init system that the --init argument adds does not manage process startup). The challenge is systemd is heavy and requires more permissions to the host than is ideal. So while it is an option, its not something we could count on. An alternative would be to use something like supervisord. However, if the desire is just to fire something once at startup rather than keep a process running, its overkill as well. We can also easily document how to wire in supervisord using entrypoint.d if not provide a dev container feature for it.

chrmarti commented 2 years ago

On shell startup scripts: The shell startup scripts shouldn't be sourced in a nested way (that would be a bug). They can be sourced multiple times non-nested and need to support that (the suggested flag doesn't detect that case).

On using SUID binaries: I'm not sure that improves the security over running the container as root. Using rootless might be the best option, although Docker's documentation used to say that a container is not a security boundary, we need to check how that changes with rootless. (Interesting read on why shell scripts don't support the SUID bit: https://unix.stackexchange.com/a/2910)

Chuxel commented 2 years ago

On using SUID binaries: I'm not sure that improves the security over running the container as root. Using rootless might be the best option, although Docker's documentation used to say that a container is not a security boundary, we need to check how that changes with rootless. (Interesting read on why shell scripts don't support the SUID bit: https://unix.stackexchange.com/a/2910)

What running as non-root does in combination with not providing sudo is prevent the container OS from being modified or otherwise changed. What this approach does is isolate root access to well known scripts that themselves were set as root - much like /etc/init.d scripts. It then takes on the process start capabilities of something like systemd. The Docker security boundary consideration is about the host rather than affecting the containers themselves. If you run the engine as root on Linux (directly, not in a VM or something), and you run in the container as root, there's risks of leaking to the host. As you said, you can run rootless there (though that also has issues - namely bind mounts), but that alone does not solve any code or executables being able to otherwise modify the OS in the container. There's also an element of "falling into the pit of success" here with default behaviors. That said, for local scenarios on Linux - using rootless with volumes instead of bind mounts helps with the host concerns.

On shell startup scripts: The shell startup scripts shouldn't be sourced in a nested way (that would be a bug). They can be sourced multiple times non-nested and need to support that (the suggested flag doesn't detect that case).

How so? Sourcing the script would bring any exported variables into the process being started up. So, if they're run in sequence like say ~/.bashrc then ~/.profile, the second .profile script would get the var.

Effectively, these are scripts that are intended to run from either interactive or login shells. Zsh provides a mechanism with zshenv for this kind of thing, but bash does not, and sh only has profile - and for consistency, putting things in zshenv can result in them being fired in different places than if you are using the other shells (namely non-interactive, non-login). The issue with sourcing twice is that this has overhead.

chrmarti commented 2 years ago

What running as non-root does in combination with not providing sudo is prevent the container OS from being modified or otherwise changed. What this approach does is isolate root access to well known scripts that themselves were set as root - much like /etc/init.d scripts. It then takes on the process start capabilities of something like systemd. The Docker security boundary consideration is about the host rather than affecting the containers themselves. If you run the engine as root on Linux (directly, not in a VM or something), and you run in the container as root, there's risks of leaking to the host. As you said, you can run rootless there (though that also has issues - namely bind mounts), but that alone does not solve any code or executables being able to otherwise modify the OS in the container. There's also an element of "falling into the pit of success" here with default behaviors. That said, for local scenarios on Linux - using rootless with volumes instead of bind mounts helps with the host concerns.

SUID binaries need to be secured themselves. E.g., they need to sanitize PATH and similar env variables to avoid being tricked.

On shell startup scripts: The shell startup scripts shouldn't be sourced in a nested way (that would be a bug). They can be sourced multiple times non-nested and need to support that (the suggested flag doesn't detect that case).

How so? Sourcing the script would bring any exported variables into the process being started up. So, if they're run in sequence like say ~/.bashrc then ~/.profile, the second .profile script would get the var.

Startup scripts I have seen are written in a way that allows either ~/.profile (login shell) or ~/.bashrc (interactive shell) to be run exactly once. It seems we should stick to this. (https://medium.com/@rajsek/zsh-bash-startup-files-loading-order-bashrc-zshrc-etc-e30045652f2e has a nice diagram on this.)

Effectively, these are scripts that are intended to run from either interactive or login shells. Zsh provides a mechanism with zshenv for this kind of thing, but bash does not, and sh only has profile - and for consistency, putting things in zshenv can result in them being fired in different places than if you are using the other shells (namely non-interactive, non-login). The issue with sourcing twice is that this has overhead.

It seems /etc/profile.d would cover Bash and /bin/sh, but not zsh?

Chuxel commented 2 years ago

Startup scripts I have seen are written in a way that allows either ~/.profile (login shell) or ~/.bashrc (interactive shell) to be run exactly once. It seems we should stick to this. (https://medium.com/@rajsek/zsh-bash-startup-files-loading-order-bashrc-zshrc-etc-e30045652f2e has a nice diagram on this.)

It seems /etc/profile.d would cover Bash and /bin/sh, but not zsh?

/etc/profile.d is only for bash and sh/ash/dash login shells. It does not fire for interactive shells - most notably, docker -it. Using bash -i in a window that is not interactive results in complaints about TTY, so generally if you need the contents of profile/rc files, you'd use bash -l or the equivalent for that shell. zsh doesn't use /etc/profile.d at all and has its own files - but the same thing applies to /etc/zsh/zshrc and /etc/zsh/zprofile.

There is no universal file that fires for either -i or -l for bash/sh/ash/dash.

The goal here is a bit different and isn't about personal profile/rc files really at all. What we need a pure image-based way to include environment variables and sourced scripts in sh -l, bash -l, bash -i, bash -li, zsh -l, and zsh -li. So, the proposal here is to create a way to enable this rather than leaving authors up to their own approaches. The expansion into CI scenarios will make this problem more apparent since -l is much more common there than -i given its inherently not interactive. If we don't provide a way, I expect we'll just get a bunch of one-offs that do something along these lines.

That said, love to hear other ideas if you have them! Either way, not having to have each feature/script go manually check and modify the profile/RC files would be a good thing even if we just lock it into /etc/bash.bashrc and /etc/zsh/zsrc and ignore the login problem.

SUID binaries need to be secured themselves. E.g., they need to sanitize PATH and similar env variables to avoid being tricked.

Yeah, this is fired during the entrypoint only. So, we should for sure check for PID 1 (added that above). We'd have to think about whether we could mandate a clean environment for execution that wouldn't take the local env in place should someone go execute it manually as PID 1 (this is why sudo has different PATH settings than non-sudo).

chrmarti commented 2 years ago

My intuition here is that we should try to use the same approach shells are using. E.g., should we support both login and interactive startup scripts and source a feature's profile.d for login shells and a similar shellrc.d for interactive shells?

The situation with shell startup scripts seems overly complicated, but I'm not sure we can improve it by introducing our own scheme. E.g., sourcing a feature's profile.d from .bash_profile and .bashrc would source it twice when Bash is run as interactive login shell because .bash_profile typically runs .bashrc when in interactive mode. (Here the flag you suggest would work, but there might be value in having separate scripts for login and interactive shells.)

In which situation would we have an interactive shell without having run the login scripts in one of its parent processes? That seems counterintuitive. E.g., the login scripts typically start an ssh-agent (if not running already) and set SSH_AUTH_SOCK.

Chuxel commented 2 years ago

My intuition here is that we should try to use the same approach shells are using. E.g., should we support both login and interactive startup scripts and source a feature's profile.d for login shells and a similar shellrc.d for interactive shells?

The situation with shell startup scripts seems overly complicated, but I'm not sure we can improve it by introducing our own scheme. E.g., sourcing a feature's profile.d from .bash_profile and .bashrc would source it twice when Bash is run as interactive login shell because .bash_profile typically runs .bashrc when in interactive mode. (Here the flag you suggest would work, but there might be value in having separate scripts for login and interactive shells.)

Yeah that's fair - and since we're talking about files, you can symlink when you want both - the main challenge is the flag ends up getting added to the script instead in those cases.

In which situation would we have an interactive shell without having run the login scripts in one of its parent processes?

You can also change the default terminal setting in VS Code if you're used to a login shell. I know of a couple of major instances of this. Part of the problem is the Linux default for the terminal is pure -i, so profile doesn't work, so it's an easy mistake to just do -l.

Any external use that is not interactive (docker exec <container> bash -lc 'command') or entrypoint scripts that specify it since they need the environment (#!/bin/bash -l) are other examples. With the spec, we can't assume VS Code is managing all connects.

The core issue is tools that require sourcing - which is a frustrating but common practice.

chrmarti commented 2 years ago

Part of the problem is the Linux default for the terminal is pure -i, so profile doesn't work, so it's an easy mistake to just do -l.

I see an env variable set in ~/.profile in the terminal even though the terminal seems to run with -i. It seems to inherit the profile variable from a parent process.

If this is always the case, we could simplify our approach by only sourcing the feature setup scripts from the shell profiles /etc/profile.d and /etc/zprofile.

Chuxel commented 2 years ago

@chrmarti In Remote - Containers? We have userEnvProbe set to login interactive there, so yes, it does get it from the parent process. If userEnvProbe was set to none, you wouldn't see login variables because VS Code's default terminal is-i. Also I don't believe userEnvProbe handles sourced content like nvm does it? Either way - the issue is things not spawned by VS Code.

Chuxel commented 2 years ago

I broke out the UID/GID update aspect that drives the need for entrypoint.d into a separate proposal (#25) since @chmarti pointed out this could be done from the build step that makes those adjustments. This would then limit the number of places that you'd even need root access so we could defer the SUID aspect of the proposal here. I then updated the description above to not drive towards SUID for the initial phase and we can consider going beyond that later.