NNPDF / pinefarm

Generate PineAPPL grids from PineCards
https://pinefarm.readthedocs.io
GNU General Public License v3.0
1 stars 0 forks source link

Container architecture #17

Open alecandido opened 1 year ago

alecandido commented 1 year ago

I thought a bit about this idea, and I'm coming with a proposal.

I want to split the pinefarm package in two different ones (but distributed together, as eko and ekobox), to put a boundary between the two. One it will be the current UI, with the CLI and all the tools (installation, configs, ...). The other will contain mostly run.py and external/, and it will contain whatever is strictly related to the computation itself.

Of course with a new package I will need a new name, the best I came with is pinefarmer. Alternatives are welcome.

So, pinefarm will do everything it is doing, plus managing the container as well. pinefarmer instead will be installed inside the container, and it will accept a minimal input from the outside, and perform the actual grid computation.

Then, there is the problem of tooling for containers.

Managing containers

There are two main container engines for our purposes: Docker (by Docker) and Podman (by RedHat). Docker is more or less the first and most popular one, while Podman arrived later on. There are a few more engines, and in general more complications of many kinds (orchestration, runtimes, ...), mainly because cloud computing is a big market. We are not really interested in cloud computing at the moment, we just want to take a tool from there, but in case you struggle with the vocabulary, RedHat provides a good summary.

Initialize disclaimer: I dislike Docker, at this point also for historical reasons, a few of which still applies, but not all of them. I might be biased.

The main difference between Docker and Podman (besides the companies behind them) is that the first requires a daemon to run (dockerd), and the second one is daemon-less. In the old times (i.e. a couple years ago, at most) dockerd required to run under root user, now they also provide a root-less option (but, if I understood correctly, it is not the default one). More details on this RedHat page.

This reason for me was sufficient to choose Podman: I could simply do apt install podman, and then use the CLI:

podman pull <container-image>
podman run <contianer-image>
podman ps  # show active containers
...

nothing more.

Now, I'd like not to rely on the CLI availability, and if possible also not relying on anything else to be installed. I would prefer that everything is installed by the pinefarm package installation, as a Python dependency, but I'm not sure I can do.

Docker would require at least the daemon installation, Podman maybe would require nothing.

On the other hand, I'd like to have a ready-to-use Python package, and Docker has it -> docker-py. While Docker is not traditionally open source (even if it released and even "donated" some codes after some time), this is. Podman also has a python package -> podman-py, but it is much less popular and maintained. However, it mostly contains bindings for Podman REST API, so we could skip the package and directly go for requests to the API. But this would require a service to run (and so also the Python package requires it), so not much different from root-less Docker in the end...

They both have a library that is directly accessible, to make use of the same functions of the CLI, so (at least for Podman) it wouldn't require a service to run. But, of course, they are Go libraries:

The Docker one is most likely the corresponding of the Python package (that I expect to be bindings to it), but it might require the daemon anyhow. I also expect you do not want to move pinefarm to Go...

scarlehoff commented 1 year ago

I would avoid doing the managing ourselves. We should provide the container image and then every person can use whatever they want (or, more often than not, whatever their IT admin allows them to use).