Closed jfennick closed 1 year ago
Merging #247 (ea43afc) into main (823b687) will decrease coverage by
0.01%
. The diff coverage is68.42%
.
@@ Coverage Diff @@
## main #247 +/- ##
==========================================
- Coverage 32.03% 32.03% -0.01%
==========================================
Files 30 30
Lines 30476 30484 +8
Branches 9133 9137 +4
==========================================
+ Hits 9764 9766 +2
- Misses 18160 18169 +9
+ Partials 2552 2549 -3
Files | Coverage Δ | |
---|---|---|
cwl_utils/image_puller.py | 82.89% <84.61%> (-1.83%) |
:arrow_down: |
cwl_utils/docker_extract.py | 87.50% <33.33%> (-5.84%) |
:arrow_down: |
The primary purpose of cwl-docker-extract is to pull all images in a workflow (and subworkflows, etc) and populate the default docker / --container-engine cache. Additionally copying the images to another directory is often unnecessary and takes up 2X the disk space. For large workflows that have many unoptimized images, this could approach 100GB. This PR makes copying optional (for docker only).
Thanks for the PR. I'm glad to hear about your use case, though it is not the original use case for this script. There are HPC systems where the compute nodes have no internet access, and thus the docket containers have to be downloaded on a different system and transferred out of band to the compute nodes.
The primary purpose of cwl-docker-extract is to pull all images in a workflow (and subworkflows, etc) and populate the default docker / --container-engine cache. Additionally copying the images to another directory is often unnecessary and takes up 2X the disk space. For large workflows that have many unoptimized images, this could approach 100GB. This PR makes copying optional (for docker only).
Thanks for the PR. I'm glad to hear about your use case, though it is not the original use case for this script. There are HPC systems where the compute nodes have no internet access, and thus the docket containers have to be downloaded on a different system and transferred out of band to the compute nodes.
Right, I should have phrased that differently. I also have to deal with an out of band HPC machine, and I agree that is one use case.
The use case I mentioned above is a workaround for the fact that cwltool only uses docker run when executing workflows, and by default docker run will only pull an image if one does not exist locally (--pull missing). If an image exists locally, but it is not in fact the latest image, docker run will execute the old image. This can lead to "works for me" situations where someone is developing a docker image on their local machine and periodically pushing, but everyone else gets one of the stale images and thus weird errors. Perhaps I'll make a PR for cwltool and/or Toil as well.
https://cwltool.readthedocs.io/en/latest/cli.html#cmdoption-cwltool-force-docker-pull
Nevermind, I guess there's already an option for cwltool. It would be nice if --force-docker-pull is enabled by default.
The primary purpose of cwl-docker-extract is to pull all images in a workflow (and subworkflows, etc) and populate the default docker / --container-engine cache. Additionally copying the images to another directory is often unnecessary and takes up 2X the disk space. For large workflows that have many unoptimized images, this could approach 100GB. This PR makes copying optional (for docker only).