canonical / data-science-stack

Stack with machine learning tools needed for local development.
Apache License 2.0
15 stars 5 forks source link

Exploration: Use Singularity Containers for DSS #20

Open misohu opened 8 months ago

misohu commented 8 months ago

Why it needs to get done

For the DSS we want to execute containers. As an alternative to Docker Singularity containers come into play.

What needs to get done

When is the task considered done

We have clear understanding if Singularity is our way to deploy GPU workloads.

syncronize-issues-to-jira[bot] commented 8 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5249.

This message was autogenerated

misohu commented 8 months ago

Singularity is an open-source container solution designed for scientific and high-performance computing environments. It was developed to address some of the limitations and concerns associated with other container technologies like Docker in these specialized fields.

Some key features of Singularity containers include:

  1. Security and Permissions: Singularity containers can be run with user-level permissions, allowing users to execute containers without needing root access. This makes Singularity suitable for shared computing environments where users may not have administrative privileges.
  2. Compatibility: Singularity is compatible with container images from Docker and other container technologies. This compatibility makes it easy to leverage existing containerized applications while benefiting from Singularity's specific features.
  3. Performance: Singularity is optimized for high-performance computing (HPC) environments. It provides efficient support for running parallel applications and integrates well with common HPC resource managers.
  4. Sandboxed Environment: Like other containers, Singularity provides a sandboxed environment, allowing applications to run consistently across different systems without worrying about dependencies or system-specific configurations.
  5. Single Image File: Singularity containers are stored as a single image file, making it easy to share and transport containerized applications.
  6. User-Friendly: Singularity aims to be user-friendly, and its design focuses on ease of use and accessibility for researchers and scientists who may not have extensive experience with container technologies.

Singularity has gained popularity in scientific and research communities, particularly in fields such as bioinformatics, computational chemistry, and physics. Researchers often use Singularity to package and distribute their scientific workflows and applications, ensuring reproducibility across different computing environments.

misohu commented 8 months ago

To run the Singularity containers SingularityCE is being used. This open source project uses BSD 3 license which may or may not be a problem for us ? I tried some experiments on my local PC and these are findings:

  1. You can run Docker images natively with singularity containers. Here is small example the .sif is Singularity image format. You can either create own or just transform Docker with commands bellow. You also need to understand how RUN and ENTRIPOINT alternatives work in singularity. Reference here.
    singularity pull docker://docker_image_name
    singularity run docker_image_name.sif 
    # or
    singularity exec docker_image_name.sif your_command
    singularity shell docker_image_name.sif

    Now there are some important differences between Docker and singularity model (most of them listed here). Most critical ones are

Conclusions

Singularity looks like a nice alternative to Docker for our use case thanks to its almost seamless Docker image execution (we still need to keep in mind the potential problems I highlighted above). It has working Nvidia support. The GPU workloads are faster (but the performance increase is IMHO neglectable). There might be potential problems with intel GPU support which might be resolved in future with CDI.

ca-scribner commented 8 months ago

Sounds like there's a lot of interesting features. Are there key features that Docker has and Singularity doesn't? I'm wondering if someone gave me the sales pitch of "switch from Singularity to Docker because of ___", what would that look like?

I'm guessing there's reasons since Docker has much better adoption, but maybe that is just because docker came first?

misohu commented 8 months ago

Yes Docker came before Singularity (2013 vs 2015).

For the things that docker has singularity does not I recommend reading this. Most important ones are:

  1. Daemonless Operation: Docker operates with a central daemon process, which manages and controls containers. Singularity, on the other hand, is designed to be more user-centric and doesn't require a central daemon. This can be seen as an advantage in certain contexts, especially for users who want more control over their containers without relying on a daemon.

  2. Rootless Operation: Docker has made progress towards allowing users to run containers without requiring root privileges, known as rootless Docker. Singularity typically doesn't require root privileges by default, making it more user-friendly in environments where users don't have administrative access. However, Docker's rootless mode might have more advanced features compared to Singularity in this regard.

  3. Layered File System: Docker uses a layered file system, allowing for more efficient use of disk space when multiple containers share the same base layers. Singularity, by default, uses a simpler and more image-centric file system approach, which may not be as space-efficient as Docker's layered approach.

  4. Docker Hub and Registry Ecosystem: Docker has a well-established and widely used container registry called Docker Hub. It provides a centralized repository for sharing and distributing Docker images. While Singularity has its own mechanisms for image distribution, Docker Hub's popularity and extensive image library might be considered an advantage for Docker in some contexts.

  5. Networking and Orchestration: Docker has built-in networking capabilities and supports orchestration tools like Docker Compose and Kubernetes for managing and scaling containerized applications. Singularity is more focused on providing user isolation and doesn't have the same level of built-in networking and orchestration features.