apptainer / apptainer

Apptainer: Application containers for Linux
https://apptainer.org
Other
955 stars 120 forks source link

X.509 Integration in Apptainer #2240

Open jrwds opened 2 weeks ago

jrwds commented 2 weeks ago

Hi everyone,

The past few months I've been looking into using X.509+Apptainer for my org and I've talked with a number of people across the Apptainer community about this. I wanted to make a post retracing what I've learned, documenting my use case, detailing how X.509 might help, and exploring what improvements might be useful going forward. This should be useful for others in my shoes in the future, but there is also a unique opportunity for Apptainer to be a leader in the rootless container/X.509 application space across the broader backdrop of security models in non-personal computing platforms (data centers, cloud, HPC, etc) that is not being capitalized on as much as it could be.

Retracing the Steps

There is a relatively lengthy history of PKI integration into Apptainer, closely mirroring discussions from Singularity-CE. I started learning more about Apptainer ~6 months ago and so was unaware of this commentary. Here's a few links detailing those discussions for those unaware like me, although there are many more:

In general: PGP support had been around for some time, along with a custom built signature system that sif used (for both Apptainer and Singularity). X.509 support was added in late 2022 (https://github.com/apptainer/sif/pull/147); Singularity switched to DSSE+in-toto (https://github.com/sylabs/sif/releases/tag/v2.9.0). Apptainer dropped the custom built signature system and adopted DSSE based on the Singularity version of X.509 (https://github.com/apptainer/sif/pull/164). I am also aware that part of Sylabs' business model is acting as an effective PGP key management system, although I don't think they offer that for X.509 (which due to its nature may not allow for such a business model).

My Organization

I come from a data science background. I work at an organization that intersects academia, industry, government, even military sometimes: we operate experiential learning data science projects for university students who get paired with a partner (government, industry, etc) who provides data + mentorship + much else as students create data science solutions using real world data. We organize the projects, and also provide educational resources through direct teaching and learning materials, some of which is made by us from scratch. We work with literally thousands of students not only at our home university but also across the US. We have hundreds of faculty, corporate, government etc contacts who we work with to coordinate these projects, and also are funded in more or less any way possible. It's a complex operation with a small staff, and an even smaller technically inclined staff (~half dozen of us myself included) and thousands to support.

Our Service

We use an HPC environment to serve students all the data they need as well provide them with computing resources to create and run code for their projects. In the past we've gotten by just fine with no PKI (read: detailed UNIX perms + databasing) and/or PGP. We've also used Apptainer for a number of years to serve a few major use cases:

1) Default kernels/environments (Python, R, etc) for HPC 2) Custom kernels/environments (for specific groups of students) for HPC 3) Creating learning materials (namely a sif containing Python + Jupyter Lab + whatever packages + a coding example showing NN, KNN, NLP, etc) almost all of which I've made that lives on the HPC for students to access

Our Problems + Potential Solution

This has grown increasingly complex to scale, as every semester we have to rebuild many containers, make new containers for groups, confirm they work, make sure permissions are correct, update packages, fix broken ones, answer tickets, build new systems, etc. It is possible that instead of our in-house staff doing all that work, we could offload #2 and #3 to students/corporate contacts/professors/etc directly; they could make their own containers that they need to use, we just provide guides on how to make them with Apptainer on the HPC. To do this, we would need a few key objectives met:

1) Verify provenance of containers (who built it and why and at what stage) 2) Verify execution permissions (ECL) on certain secured containers for special projects 3) Make it easy for our (small) staff to administer the whole system not only now but in the future when some of us may move on 4) Store everything in the container (so if you have the container, you have the provenance/etc) instead of managing multiple files/directories/databases 5) Verify integrity of containers (no one modified it after being made) 6) Provide tools for non-staff to verify containers 7) Verify identities (which staff authenticated which non-staff) 8) Be a PKI standard that has robust free documentation (we can't afford to train staff if they don't already know the system/tool already) and preferably has precedence being used on HPC systems

PGP can do 2, 5, 6 and 8 no problem. But 1, 3, 4 and 7 are hard to do with PGP alone. Using PGP, we would have to make/use a customized keyring system with databases tracking all the containers and if we are dealing with hundreds or thousands of containers, made by multiple hands at various stages... it's not hard to see that it would turn into a huge mess and wouldn't meet some of those objectives. X.509 could meet all those requirements if set up properly, and once proper documentation is made for users it could more or less run itself, but it would be complex to set up. To summarize, it's the difference between "easy to set up, difficult to maintain" vs. "difficult to set up, easy to maintain".

Other Things

If we were able to use X.509 with our .sif's, there are a few other points to note:

Summary

In summary, we can't efficiently scale our containerization process across universities/corporations/government/military involving thousands of people without having battle-tested PKI infrastructure baked into our container engine that uses a centralized chain of trust model. Provenance (both of users and containers), ease of administration, and logistic considerations are all currently unmet by the web of trust security model.

Further Questions/Call To Action

I welcome any kind of feedback.

GodloveD commented 1 week ago

Thanks very much for this detailed write up including your proposed use case(s). I think this warrants a deeper discussion. Potentially within our 2x monthly community meetings.

jrwds commented 1 week ago

Thanks Dave. Yes, I plan on attending the next community meeting to talk about this further.