UNM-CARC / docker_base

Base Docker Image for CDSE containers running at CARC
0 stars 1 forks source link

Formalize docker_base as a Spack environment #1

Open qwofford opened 4 years ago

qwofford commented 4 years ago

Formalizing our Spack environment helps us achieve reproducibility goals by:

From Spack documentation:

  1. Environments separate the steps of (a) choosing what to install, (b) concretizing, and (c) installing. This allows Environments to remain stable and repeatable, even if Spack packages are upgraded: specs are only re-concretized when the user explicitly asks for it. (see Spack lock files)

  2. An Environment that is built as a whole can be loaded as a whole into the user environment. An Environment can be built to maintain a filesystem view of its packages, and the environment can load that view into the user environment at activation time. Spack can also generate a script to load all modules related to an environment. We will do something slightly different with Docker here. I would like to export a spack environment (eg- spack env load), and dump this environment setup script into a "docker_base.sh" script in /etc/profile.d. This is extensible for all layers of our reproducible infrastructure, as everything with a .sh extension is loaded as a consequence of default settings in /etc/bashrc in Centos7.

The overall design goal I have in mind is to have a Docker image tightly coupled with a Spack environment for all layers of our reproducible infrastructure. The software stack for such an environment might look like this:

Reproducible OS base: The Docker environment gives us a nice OS base, and helps us cover some of the software dependencies that don't need to be compiled using that OS' package manager. For example- Openssl, LaTeX, glibc, etc.

Compiled software stack for HW optimization: Spack allows us to compile software when required, if a such package exists in Spack. This is the simplest way to compile software with HW optimizations. Creating a Spack package is preferable, but may not make sense for all experiment pipelines. Software design principles are secondary to research artifact deliverables.

All other software: When a Spack package is unavailable, explicit scripts to pull source code and build based on simple bash scripts is a last resort.

patrickb314 commented 4 years ago

Why isn't the current approach - the environment is defined by the spack.yaml in /home/docker, a view is placed in /usr/local, and the entry point makes sure /usr/local/ is in the appropriate path - sufficient? Also, spack can manage software installs with make/configure/etc. through scripts even when there's not a spack package file using spack dev-build.

qwofford commented 4 years ago

I have a few thoughts on this: Simplest solution: export the environment config with "spack env load " and dump it into the same /home/docker directory. This will allow us to avoid LD_LIBRARY_PATH issues like @sahba-t experienced. But then the user of this image needs to know that the Spack env should be loaded, and where the Spack env activation script exists.

Simple interfaces that promote composability: When you create an environment locally and define spack.conf, you specify environment "view" settings which are referenced when Spack is setting install prefixes. View settings in this spack.yaml file specify /usr/local as the Spack root. We need to find a way to tell the OS about these paths. We currently solve an immediate problem with gsl by setting LD_LIBRARY_PATH in the entry point script. The "Spack way" to handle this problem is to activate the environment before attempting to use Spack installed packages. This also appends /usr/local/bin to the PATH, so we have access to binaries installed by Spack. This activation doesn't serve our composibility design spec, however, because when someone inherits this image they must activate each layer of underlying Spack environments, and this will become an accounting headache.

One solution, and the one this issue addresses, would be to create an environment activation script (like the one spack env load <env name> provides) and throw it into profile.d, so that any user which runs the container will have binaries and libraries from all images throughout the history of a layered Docker image, with no accounting effort on their part.

qwofford commented 4 years ago

There are related composibility issues with using /home/docker as an environment home and /usr/local as an installation prefix.

I like /home/<username>_<docker commit hash> as a home for the Spack environment associated with a docker image by an individual and /opt/<username>_<docker commit hash> for spack installs.

patrickb314 commented 4 years ago

"because when someone inherits this image they must activate each layer of underlying Spack environments, and this will become an accounting headache."

No, they don't. Look how bsp_prototype works. It just goes into the existing environment, uses spack add to add additional packages to the environment, and then rebuilds it and refreshes the view of it.

Why do we need multiple environments in a container, as opposed to just the one the container needs in a well-defined place that the system configuration (either /etc/profile.d... or entrypoint.sh) know about? What's the problem we're trying to solve here?

qwofford commented 4 years ago
  1. I believe the bsp_prototype works by assuming all installations for a docker image will go into /usr/local/. This conflates packages I installed with packages anyone else installs. This could lead to interpretability concerns when someone is trying to figure out which packages are associated with my research artifacts versus anyone else's research artifacts.
  2. It also assumes that the most complicated thing a Spack environment activation will do is append paths to LD_LIBRARY_PATH and PATH, and I don't know if that's an assumption we want to make.
qwofford commented 4 years ago

Although we update the view, the view is not reflected in our system search paths, which is why we had to modify LD_LIB... and PATH manually

patrickb314 commented 4 years ago

Sure, but if we pick a directory in the system path, either initially or that we put there, and just use that for the view (which is hat we’ve done!) this solves the problem. Our only problem before was that some people didn’t have /usr/local/XXX in their system path, which we corrected. This will be an issue for any path we pick.

Again, it’s not clear to me what problem we’re tryingto solve that the current setup doesn’t address.

From: Quincy Wofford notifications@github.com Reply-To: UNM-CARC/docker_base reply@reply.github.com Date: Thursday, November 7, 2019 at 1:03 PM To: UNM-CARC/docker_base docker_base@noreply.github.com Cc: Patrick Bridges patrickb314@gmail.com, Comment comment@noreply.github.com Subject: Re: [UNM-CARC/docker_base] Formalize docker_base as a Spack environment (#1)

Although we update the view, the view is not reflected in our system search paths, which is why we had to modify LD_LIB... and PATH manually

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/UNM-CARC/docker_base/issues/1?email_source=notifications&email_token=ACQTKTSNLN4Z2VE4RQBB4H3QSRYAFA5CNFSM4JKLPE22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNUKDA#issuecomment-551240972, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQTKTSHTY44KWU5CBMT2CDQSRYAFANCNFSM4JKLPE2Q.

patrickb314 commented 4 years ago

But it’sa container – you build one for your application. What’s the scenario in which it would be shared ad this would be necessary?

From: Quincy Wofford notifications@github.com Reply-To: UNM-CARC/docker_base reply@reply.github.com Date: Thursday, November 7, 2019 at 12:59 PM To: UNM-CARC/docker_base docker_base@noreply.github.com Cc: Patrick Bridges patrickb314@gmail.com, Comment comment@noreply.github.com Subject: Re: [UNM-CARC/docker_base] Formalize docker_base as a Spack environment (#1)

I believe the bsp_prototype works by assuming all installations for a docker image will go into /usr/local/. This conflates packages I installed with packages anyone else installs. It also assumes that the most complicated thing a Spack environment activation will do is append paths to LD_LIBRARY_PATH and PATH, and I don't know if that's an assumption we want to make.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/UNM-CARC/docker_base/issues/1?email_source=notifications&email_token=ACQTKTXZMMPKAC2ZRIT3XP3QSRXR5A5CNFSM4JKLPE22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNT64I#issuecomment-551239537, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQTKTQDFY2MB3GIJCB2CJ3QSRXR5ANCNFSM4JKLPE2Q.

qwofford commented 4 years ago
  1. Establishing a spack view is only relevant to the installation of a spack environment. The OS needs to know about these paths somehow, and in the simplest case, exporting LD_LIB... and PATH is sufficient. Although we are using a simple "view" configuration, these views can become more complicated. According to the Spack team, the correct way to export an environment activation script is to use: spack env activate <env> --sh. This ensures that everything spack referenced according to your installation "view" is also reflected in your spack environment. We could just export a bunch of environment variables, but eventually we will miss a corner case. Spack takes care of these issues for us, so why not take advantage?

  2. When there is more than one person working on a project, the namespace will become polluted if we use a single /home/docker for spack environments and a single /usr/local path for a spack installation directory.

At LANL I'm trying to support a team of about 6 people, each of whom are working with similar versions of very specific software. In our case, it is useful for people to treat the Docker image as if it were a shared resource/OS image. All this software is in a "research" state, meaning that some features only work in very specific configurations, which may be distinct from other users.

However, it's often the case that a user will want to use "everything that So-and-so did, except we need a special HDF5 with a particular plugin." Situations like this will result in namespace pollution if we install everything in /usr/local.

  1. It's easier to write documentation if the steps are clear and simple. It's easier to tell one of these 6 users to "Run the docker image interactively, build your Spack environment, export the environment activation script, dump the script into /etc/profile.d, push to your branch on dockerhub" than it is to explain what an entrypoint is, and how to discover all the variables that need to be defined there.

  2. Ultimately, I think the strategy I proposed moves us closer to a solution where we have a single "carc-wheeler" base image which is shared collaboratively among teams. I'm thinking that certain classes of users will want something slightly different than others, but they might also want to share things. This is how we work on shared computing devices today, and we can replicate that experience in our infrastructure by artificially imposing some boundaries on where users are "allowed" to install their packages.

patrickb314 commented 4 years ago
  1. The scripts then have to activate the correct environment, and preserve that environment in the code that’s running, just like a module command. Doing that consistently is something we’ve struggled with in the past. I did try initially using a specific environment, but struggled with what to name it and how to make sure it was consistently set up.
  2. In this case, each user would have it’s own container layered on top of the base, though, right? Could they each build their own equivalent of bsp_prototype (or a branch of it), that customizes to the needs of their software, as opposed to keeping multiple variants of the software install in multiple environments in a single (large) container?

From: Quincy Wofford notifications@github.com Reply-To: UNM-CARC/docker_base reply@reply.github.com Date: Friday, November 8, 2019 at 8:35 AM To: UNM-CARC/docker_base docker_base@noreply.github.com Cc: Patrick Bridges patrickb314@gmail.com, Comment comment@noreply.github.com Subject: Re: [UNM-CARC/docker_base] Formalize docker_base as a Spack environment (#1)

  1. Establishing a spack view is only relevant to the installation of a spack environment. The OS needs to know about these paths somehow, and in the simplest case, exporting LD_LIB... and PATH is sufficient. Although we are using a simple "view" configuration, these views can become more complicated. According to the Spack team, the correct way to export an environment activation script is to use: spack env activate --sh. This ensures that everything spack referenced according to your installation "view" is also reflected in your spack environment. We could just export a bunch of environment variables, but eventually we will miss a corner case. Spack takes care of these issues for us, so why not take advantage?
  2. When there is more than one person working on a project, the namespace will become polluted if we use a single /home/docker for spack environments and a single /usr/local path for a spack installation directory.

At LANL I'm trying to support a team of about 6 people, each of whom are working with similar versions of very specific software. In our case, it is useful for people to treat the Docker image as if it were a shared resource/OS image. All this software is in a "research" state, meaning that some features only work in very specific configurations, which may be distinct from other users.

However, it's often the case that a user will want to use "everything that So-and-so did, except we need a special HDF5 with a particular plugin." Situations like this will result in namespace pollution if we install everything in /usr/local.

  1. It's easier to write documentation if the steps are clear and simple. It's easier to tell one of these 6 users to "Run the docker image interactively, build your Spack environment, export the environment activation script, dump the script into /etc/profile.d, push to your branch on dockerhub" than it is to explain what an entrypoint is, and how to discover all the variables that need to be defined there.
  2. Ultimately, I think the strategy I proposed moves us closer to a solution where we have a single "carc-wheeler" base image which is shared collaboratively among teams. I'm thinking that certain classes of users will want something slightly different than others, but they might also want to share things. This is how we work on shared computing devices today, and we can replicate that experience in our infrastructure by artificially imposing some boundaries on where users are "allowed" to install their packages.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/UNM-CARC/docker_base/issues/1?email_source=notifications&email_token=ACQTKTS657W6CUP7N4RJRPTQSWBKPA5CNFSM4JKLPE22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDSO56Q#issuecomment-551874298, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACQTKTWGUFHWRSTYXSSNDATQSWBKPANCNFSM4JKLPE2Q.