bootstraping from docker ignores SINGULARITYENV_PATH

AdamSimpson commented 7 years ago

Version of Singularity:

2.3.1 and development branch

Expected behavior

When setting SINGULARITYENV_PATH the PATH should be set accordingly in the container.

Actual behavior

The file /.singularity.d/env/10-docker.sh is created which sets PATH but ignores SINGULARITYENV_PATH

Steps to reproduce behavior

$ cat Ubuntu.def
BootStrap: docker
From: ubuntu:zesty

$ singularity create Ubuntu.img
$ sudo singularity bootstrap Ubuntu.img Ubuntu.def 

$ SINGULARITYENV_PATH=/foo_bar singularity exec Ubuntu.img printenv
...
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
....

vsoch commented 7 years ago

hey @AdamSimpson ! I think this is because that variable is defined in ubuntu zesty, here is the docker manifest for it:

{'ArgsEscaped': True,
 'AttachStderr': False,
 'AttachStdin': False,
 'AttachStdout': False,
 'Cmd': ['/bin/bash'],
 'Domainname': '',
 'Entrypoint': None,
 'Env': ['PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'],
 'Hostname': 'c2c431b0e96e',
 'Image': 'sha256:1f9229cb01ad0ca760edcad7943cd9c0e9e9b908f12869895befc7bdf0bb66f9',
 'Labels': {},
 'OnBuild': None,
 'OpenStdin': False,
 'StdinOnce': False,
 'Tty': False,
 'User': '',
 'Volumes': None,
 'WorkingDir': ''}

The SINGULARITYENV_PATH might be in one of the other files, what do you see there?

vsoch commented 7 years ago

ah, and I found the reason for path! It's because we have another variable that is SINGULARITYENV_PATH, this is set in etc/init

# This will be sourced before launching a Singularity container.
# Any variables prefixed with "SINGULARITYENV_" will be transposed
# properly into the container. For example:
# SINGULARITYENV_LD_LIBRARY_PATH -> LD_LIBRARY_PATH

# Environment modules if set, cause errors in containers
unset module
unset ml

# Bash env has been known to cause issues in containers
unset BASH_ENV

# Provide a sane path within the container
if [ -z ${SINGULARITYENV_PATH+x} ]; then
    SINGULARITYENV_PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin"
else
    SINGULARITYENV_PATH="$SINGULARITYENV_PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin"
fi

# Don't save the shell's HISTFILE
SINGULARITYENV_HISTFILE=""

export SINGULARITYENV_PATH SINGULARITYENV_HISTFILE

If you try with a different variable it works:

SINGULARITYENV_BLA=foo_bar singularity exec  --cleanenv docker://ubuntu:zesty printenv
Docker image path: index.docker.io/library/ubuntu:zesty
Cache folder set to /home/vanessa/.singularity/docker
Creating container runtime...
LD_LIBRARY_PATH=/.singularity.d/libs
HOME=/home/vanessa
PS1=Singularity> 
BLA=foo_bar
TERM=xterm-256color
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LANG=C
SINGULARITY_CONTAINER=ubuntu:zesty
PWD=/home/vanessa
SINGULARITY_NAME=ubuntu:zesty

vsoch commented 7 years ago

Is this something you could put into your def recipe instead?

AdamSimpson commented 7 years ago

Hi @vsoch, thanks for the quick investigative work :) I believe the /etc/init code should prepend the SINGULARITYENV_PATH, if set, which is correct. All of the SINGULARITYENV_* vars get set by a call to setenv() and then the scripts under /.singularity.d/env/ are run after that. Since 10-docker.sh just sets any env vars provided by the docker manifest it blows away any corresponding SINGULARITYENV_ env vars that have been set. I am thinking that whatever generates 10-docker.sh needs to prepend docker manifest provided env vars instead of seting them absolutely.

vsoch commented 7 years ago

The python code generates 10-docker.sh, and to make it work for import/bootstrap it's generated via an in memory tar, and then dumped into the image. We can't so much intelligently append/prepend because we don't know what is set vs. not, and we also would have to deal with user preferences (some might want append, some might want overwrite, etc). There is also pretty poor communication between the generation of the docker layers to dump and the caller to it - it basically has one "layerfile" defined in memory where the python writes a list of .tar.gz to extract into the image.

So I don't see a good way to have detailed communication between the caller and python, and further making it flexible to account for different user desires. But I think we can come up with something... here are some ideas:

assume that user and arguments take preference, and change the ordering of when the environment scripts vs. init are called.
have an environment variable setting that disables importing Docker environment
a way to append/prepend that would work for all use cases?

What are your thoughts?

AdamSimpson commented 7 years ago

I am comming to understand that handling environment variables can get quite tricky, i've already run into similar issues when combining %environment and SINGULARITYENV. I'll think a few things through, including the ideas you had, and report back.

vsoch commented 7 years ago

sounds good :) It's dinnertime over here so I'll likely not get a chance to get back to this until tomorrow. Chat then!

AdamSimpson commented 7 years ago

The more I think about it the more appropriate it seems to handle SINGULARITYENV_* differently. It seems the point of allowing runtime control of the environment is to change what's been baked into the container, which isn't currently the case with variables set in the docker manifest. The ability to modify, but not obliterate, container environment variables at runtime is improtant for HPC centers that need to inject site specific bits(MPI, CUDA, scheduler tools, etc.) into users containers.

One fix would be to have SINGULARITYENV_* variables be written at runtime to a file in /.singularity.d/env/ that is sourced last, allowing the runtime variables to have the final say in what the environment looks like. I don't know if that's possible though on systems without NO_NEW_PRIVS support which for my particular case is required.

Perhaps something similar could be implimented without creating an actual file in /.singularity.d/env/ by delaying the setenv() calls until after all of the init scripts have been run. That gets tricky as setenv() doesn't automatically expand environment variables, e.g. setenv("PATH", "$PATH:foo_bar") needs to be manually parsed and $PATH expanded...

AdamSimpson commented 7 years ago

Another option would be to create a static script under /.singularity.d/env/, that is run last, that parses SINGULARITYENV_ variables and sets the non prefixed version in the container. It seems to me that method wouldn't require any specific kernel features, such as NO_NEW_PRIVS, nor fancy string parsing required by setenv(). This of course requires that singularity doesn't strip the `SINGULARITYENV_` prefix before sourcing this file which may have consequences i'm not aware of.

EDIT: Something like this:

prefix=SINGULARITYENV_

while read -r var ; do
  # Remove suffix from name
  non_prefixed=${var#$prefix}

  # Expand possible env vars in value
  eval non_prefixed=$non_prefixed

  # Export final variable
  export $non_prefixed
done < <(printenv | grep ^${prefix})

This would then allow the following to modify the containers baked in values:

SINGULARITYENV_PATH=\$PATH:/site/specific/path

vsoch commented 7 years ago

I think it comes down to deciding on the level that we want to operate (scripts vs. environment variables). We have to start with the assumption that the image is baked, meaning we can't write any new/custom folders in the image's environment. The current execution flow goes something like:

1. Call to Singularity init - this is akin to the init inside the container, and it's basically a clone minus the first bit to source the files in the env folder. Your PATH would be honored here.

if [ -f "$SINGULARITY_sysconfdir/singularity/init" ]; then
    . "$SINGULARITY_sysconfdir/singularity/init"
fi

2. action (C) Then we go through image handler --> action --> action itself (eg shell) in src/action-lib. Right before this in src/action.c is where we parse the user custom environment with singularity_runtime_environment. So up to here, custom paths are still honored.

3. action (exec in container) When we hit the action specific scripts above, most are pretty simple, they just check for the existence of the executor, and call it. It's within the executor we have the general code that, given that you have defined some SINGULARITY_ENV* variable that is also in Docker, the container, etc., will wipe it out:

for script in /.singularity.d/env/*.sh; do
    if [ -f "$script" ]; then
        . "$script"
    fi
done

One fix would be to have SINGULARITYENV_* variables be written at runtime to a file in /.singularity.d/env/ that is sourced last, allowing the runtime variables to have the final say in what the environment looks like. I don't know if that's possible though on systems without NO_NEW_PRIVS support which for my particular case is required.

We would only be able to do something like this during a bootstrap, but we already provide the %environment section which is sourced after Docker, so arguably we already provide that. I know this issue is pertaining to bootstrap, but we still need a fix that would work for other (non writable) commands too. Even if we could come to some creative way to write these variables at runtime, given the change to permanent squasfs containers, this wouldn't work long term. The only quasi reasonable thing would be if we implement some kind of writable overlay for the user, but then they would have writable on something which is probably a concern for other reasons.

Perhaps something similar could be implimented without creating an actual file in /.singularity.d/env/ by delaying the setenv() calls until after all of the init scripts have been run. That gets tricky as setenv() doesn't automatically expand environment variables, e.g. setenv("PATH", "$PATH:foo_bar") needs to be manually parsed and $PATH expanded...

I think it's also tricky because by the time we've hit the execution script in the container, we have de-escalated priviledge inside the container, so we are pretty much working with shell and no longer have access to the internal registry, etc.

Some ideas...

Re-order honoring of environment

If we were able to source scripts first and then export the custom SINGULARITY_ENV* variables, this would better honor runtime environments and any conflict would go to the user preference.

Control of scripts sources

Arguably, the above could be fixed if we just allowed the user to selectively disable an environment, either at time of bootstrap or during runtime. So for example, during a Docker import I might do:

SINGULARITY_DISABLE_DOCKER_ENV=yes singularity import container.img docker://ubuntu

The variable would be found, and it wouldn't produce 10-docker.sh at all. We might also let the user decide this at runtime:

SINGULARITY_ENV_DISABLE=10-docker singularity shell container.img
SINGULARITY_ENV_DISABLE=10-docker,80-custom.sh singularity shell container.img

eg, if we are going through that loop and we have:

script='/.singularity.d/env/10-docker.sh'

we then just check if it's in the list

disable_these=(${SINGULARITY_ENV_DISABLE//,/ })

# Helper function
contains () {
  local element
  for element in "${@:2}"; do [[ "$element" == "$1" ]] && return 0; done
  return 1
}

if [ -f "$script" ]; then 
    script_name=$(basename "$script" .sh)

    # Is it disabled?
    contains "$script_name" "${disable_these[@]}"

    if [ $? == 0 ]; then
        echo "environment $script disabled." 
    else
        echo "environment $script enabled." 
        . "$script"
    fi
fi

That's a sort of brute force way of doing it, the script can exist but then the user specifies all or nothing. It doesn't really have elegant control of specific environment variables.

Re-do of environment flow

Given upcoming changes to images, and better need for control of the environment, we could overhaul the current flow with something more organized. Right now we operate at a level of granularity of scripts. We have a bunch of scripts that we either dump (or let the user define and we source) and then they are sourced. It could be along the lines of keeping some kind of registry, but for environment variables, and then some final process just exports the decided on values for that. I'm not sure how that would work - I need to hear @gmkurtzer plan for new images so I know how to start my thinking.

vsoch commented 7 years ago

@gmkurtzer would like your feedback on this.

gmkurtzer commented 7 years ago

Well, the transposition of the SINGULARITYENV_ variables occurs from within the binary backend of Singularity, which is before the sourcing of the environment files within the container. The transposition must occur within the C portion of the code because that is where we are doing other variable sanitization, and it must run before any binaries within the container (think of the example of SINGULARITYENV_LD_LIBRARY_PATH).

But, we might be able to not unset the SINGULARITYENV_ prefixed variables, such that someone can still retrieve them from the container's runtime. Would that suffice?

AdamSimpson commented 7 years ago

For my particular use case I don't believe that will be neccessary. Since I am dealing with site specific variable modifications I can just as easily use a custom prefix for parsing and then don't have to worry about potential duplication between the C porition of environment processing and whatever script I have created. I was just hoping to be able to reliably use SINGULARITYENV_* to modify the user environment regardless of bootstraping source instead of writing another script.

dtrudg commented 6 years ago

Closing this as it was addressed in 2.5.0 via #1430 - please let us know if you experience any further issues.

apptainer / singularity