Open tcompa opened 8 months ago
Small addendum: the file-access issue would also be very relevant e.g. for /some/path/metadiff.json
, which goes back to the question of how impersonating users.
The proof-of-concept should be clearly explored with the single-user local runner.
Note: in what we described above, containers are units of computation. Another approach (ref https://www.nextflow.io/docs/latest/docker.html) is to use containers only to provide the environment. Quote:
In practice Nextflow will automatically wrap your processes and run them by executing the docker run
How to handle the file access in this approach? To be explored (ref https://github.com/nextflow-io/nextflow/discussions/4260). Relevant quote:
Nextflow automatically manages the file system mounts each time a container is launched depending on the process input files.
One take-home message from our current discussion:
Great summary! Another tool to look at (according to Jonas Windhager): Podman, running in user space
... and without requiring a daemon 😇
Notes from a preliminary discussion with @mfranzon :
Current task.command
Prototype of Docker-based solution
We would have two images.
The first one would be only built once, and then used from the cache
Then there would be N (N=number of relevant tasks) other images, and each one of them would be re-built any time a task is run.
and then there would be run script, like
The new task.command would be
Data lifecycle
This is a very broad question, let's re-discuss it. Here is just one possible way of doing things.
/data/active
. NOTE: this cannot have conflicts with typical filesystem paths, e.g. you would not be able to use/home/
(but it would be possible to mount/home/myuser
, provided that it has broad permissions)./data/active
, and these files are owned bydocker
user.docker
user.There are multiple ways to extend this possible scheme, and their relevance also depends on how we can impersonate users. If we are happy with either
sudo -u
or other impersonation strategies (e.g. SSH keys), then we could run Singularity containers as user processes. This may mitigate the data-access issues, as the container would have the same permissions as the user.