PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
200 stars 231 forks source link

Singularity containers for models #1391

Open ashiklom opened 7 years ago

ashiklom commented 7 years ago

Description

Singularity is a container-based infrastructure tool specifically geared towards HPC clusters. From my understanding, it roughly combines the portability of virtual machines with the performance and lightness of Docker.

I am proposing that we move towards creating Singularity containers for ecosystem models that have nontrivial system dependencies (e.g. ED, CLM, FATES).

Context

Installing models on different machines is hard, especially when the infrastructures vary and you don't have root access. Using Singularity would mean that we could compile models exactly once (on any machine with Singularity installed) and then simply distrubute the compiled containers to users that want to use different models.

An important advantage over Docker is that, once the Singularity software is installed (which does require root access, but only for this initial installation step), running Singularity containers does not require a background daemon or anything like that, which makes them well suited for execution via qsub.

Possible Implementation

Here is an example for building an ED2 container: https://github.com/ashiklom/ed-singularity

robkooper commented 7 years ago

This is what I hope we can do with the docker GSOC as well. Create docker images for PEcAn/BETY/etc and images for each model. Next we need a way to use a messaging system to run each of the models on a specific input. One solution is to use something like RabbitMQ, each image then becomes a worker for that specific model, and we make sure all data is mounted in each container. This will allow the model to access the data, run and save the results. If we want multiple runs of ED in parallel we can add multiple instances of this image.

robkooper commented 7 years ago

So you start with a docker image, could this docker image be the ED2 docker image? That way all you need is a small wrapper around the docker image to make it into a singularity image? This would allow us to use the models either with docker (easy on the mac/windows/linux) or with singularity (easy on linux, and liked in HPC centers).

ashiklom commented 7 years ago

That's a good idea. I'm​ not 100% sure how the Docker bootstrap works, but I'd be surprised if that didn't work.

On May 10, 2017 4:14 PM, "Rob Kooper" notifications@github.com wrote:

So you start with a docker image, could this docker image be the ED2 docker image? That way all you need is a small wrapper around the docker image to make it into a singularity image? This would allow us to use the models either with docker (easy on the mac/windows/linux) or with singularity (easy on linux, and liked in HPC centers).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/PecanProject/pecan/issues/1391#issuecomment-300599348, or mute the thread https://github.com/notifications/unsubscribe-auth/AFasXTDkvi4WRLdDb5oVmW54tLYxrfr4ks5r4hqmgaJpZM4NVq-J .

ashiklom commented 6 years ago

Update: The BU cluster now has Singularity installed, so this is something we should definitely consider seriously. I tried building and running an ED image, and hit some MPI snags, but got pretty far with very little effort.

mdietze commented 6 years ago

I was talking to Ethan White at ESA and he thought the support and maturity on Singularity was far behind Docker (which makes sense since the latter is commercial). What I don't know, and would be interested in learning, is

  1. how different is the build process? Is building both 2x as much work or 5% more work over just building Docker? I think if it's 2x then supporting both won't be sustainable
  2. how different would the RabbitMQ be for each? If we can pass messages to either within one message queue, that sounds like a great path to go down. If we need two entirely separate message queues then it gets really cumbersome to have to support even more ways to run a model (local no queue, local in queue, remote no queue, remote in queue, Docker, Singularity, etc)
ashiklom commented 6 years ago

My understanding is that Docker isn't typically available on University HPCs because it's really intrusive and requires sudo privileges, whereas Singularity is designed with HPCs in mind.

I think Singularity interfaces pretty cleanly with Docker and can actually install almost directly off of Dockerfiles, so it shouldn't be that much work to add it anyway. More importantly, the model of Singularity is that you build an image once on your host machine and then literally copy the entire file over to other machines, as long as both machines have Singularity installed. Almost like a VM, but leveraging the host hardware more directly (through dark magic, as far as I can tell). This means "supporting" Singularity would only require distributing the compiled images, assuming Singularity is installed (which is pretty simple in my experience).

No idea how Rabbit MQ works, or really even what it is, so I can't speak to that.

dlebauer commented 6 years ago

There was a lot of discussion about Singularity on HPC systems at a Container Analysis Workshop hosted by NDS at NCSA this week. Some notes from the discussion are in this google doc (search for 'singularity': https://docs.google.com/document/d/1lF92L-4gGr0OZiArFEsM_pn1VUK61m7w7V9hRWRLWEk/edit#

Here is the docker2singularity tool from TACC https://github.com/TACC/docker2singularity

dlebauer commented 4 years ago

@robkooper is this still relevant given that singularity can directly pull docker images?

robkooper commented 4 years ago

I would be good if we push them to https://singularity-hub.org/ that way you don't have to wait about 5 minutes for the image to be pulled and converted.

That being said, this is low priority, and maybe we can configure singularity hub to do this for us.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 365 days with no activity.

ashiklom commented 3 years ago

Singularity Hub is struggling to find support and will likely be shut down this spring. Based on that, I agree with David that our best bet is to pull directly from Docker Hub and eat the extra several minutes of build time.

Here's the email I received from the maintainer about a month ago:

Dear Singularity Hub Past or Current User,

Happy New Year! As we head into early 2021 and Singularity Hub enters it's 5th year, I am going to have a change in role that will make it no longer possible for me to maintain and support Singularity Hub. Since there is no one on my current team at Stanford to take over the responsibility, I'm reaching out to you, the user community, to see if there is interest. I'm open to many ideas for support, including maintaining the current server and builders, refactoring singularity-hub.org to be something else, or preserving all or some subset of the current containers in some kind of database. It's really cool that we have containers that go back to 2016, and (to me) they have historical or legacy value. If we cannot find a community member or organization to step up, unfortunately Singularity Hub will need to be shut down, at the latest in April of this year.

I've worked really hard to originally develop, find funding for, and keep Singularity Hub available for you as a resource over these years, often times when it was the only option. It's had three major refactors, many more updates of builders, and I've done it all on my own. So while I also am I'm saddened that it's not something I can keep going, I'm hopeful that it has been an unexpected good resource for you, and that with developing container technologies and registries, there are many more options for building and storing containers now than there were back in 2016.

Please let me know if you'd like to discuss helping out! On behalf of myself and the larger community, I think it would be hugely appreciated.

dlebauer commented 1 year ago

so - does the deprecation of Singularity Hub and the availability of docker2singularity mean we don't need a registry and / or should we find another or manage our own Singularity Registry?

If docker2singularity is sufficient, can this issue be closed?