common-workflow-language / cwl-v1.2

Released CWL v1.2.1 specification
https://www.commonwl.org/v1.2/
Apache License 2.0
34 stars 22 forks source link

Docker Requirement to Copy Container #189

Open pagrubel opened 2 years ago

pagrubel commented 2 years ago

We have a requirement for HPC to copy a container from one location to another. We have implemented an extension to do this called copyContainer and would like to see it in the standard.

mr-c commented 2 years ago

Hello @pagrubel and thanks for your suggestion. Can you share more details about your extension, the scenario, and how it works? Which CWL implementation do you use, and can your point to the implementation of your extension?

Is this due to the compute nodes on your HPC not having access to the internet, and thus can't download containers from public registries? If so, we have a pscript that can cache the containers](https://cwl-utils.readthedocs.io/en/latest/#docker-extract-py) (possibly converting Docker format to Singularity format) before scheduling execution (perhaps from a login node on your HPC system)

pagrubel commented 2 years ago

Thank you for responding so quickly @mr-c. Our implementation is BEE (Build and Execution Environment). We use Charliecloud , https://hpc.github.io/charliecloud/ as our container runtime system for our main implementation of dockerRequirements for our institutional use, although we do support Singularity but cannot use it on our institutional systems. We have implemented dockerPull and other DockerRequirements using Charliecloud and can perform unpriviledged builds from public repositories on some of our systems. However for some systems, we cannot access registries and need to copy charliecloud tarballs. The implementation is a simple copy of the tarball to our container archive directory using python system. Using Charliecloud we untar the container on the compute node of systems that do not have access to the internet.

pagrubel commented 2 years ago

We have adopted CWL for our Workflow framework, BEE, for scientific simulation workflows.

tetron commented 2 years ago

hi @pagrubel so the process is typically to implement it as an extension to cwltool as well a link to your own system (or a detailed description of what it does, if source isn't available). Then we'd discuss if it makes sense to generalize to other systems. If it happens that it is actually pretty specific to your own environment, there's nothing wrong with keeping it as a private extension, you just want to express it as a "hint" so that other systems can recognize that it is something they can safely ignore.