Closed clhappyjiang closed 6 months ago
https://github.com/PacificBiosciences/HiFi-human-WGS-WDL/issues/125#issuecomment-2078319137
I think it's likely that the problem you're seeing is caused by the inability of your HPC compute nodes to access the internet. Consolidating into a single image would not solve this issue. There are at least a few solutions:
- You can set up a private docker container repo within your HPC network, store copies of the containers that you will need for this workflow, and provide the url for this internal repo as humanwgs.container_registry in your inputs.json files.
- If you use miniwdl with singularity (for example, with the miniwdl-slurm plugin), images are downloaded and cached as .sif files on the head node (where the job was launched) and fetched from this cache when they are used. Since login nodes are typically connected to the internet, this might work for you.
- If you use Cromwell, there's a tool called singularity-permanent-cache that caches images as .sif files. You can use singularity-permanent-cache to download the containers from a node that can access the internet, then adjust cromwell.conf to use singularity-permanent-cache to lookup the file path for the containers.
If you're looking for help with any of this, please email support@pacb.com.
Question 1:
Yes, the default Cromwell configurations expect that compute nodes will be able to contact the docker servers to check for a new version of an image. There are ways to work around this, described in the comment above from #125.
If you go with option 1 and mirror the docker images on a docker server within your network, you can add a key/value pair like "humanwgs.container_registry": "<yourdockerhostname>"
to the inputs.json to point to the internal docker host.
Options 3 requires using an additional tool to cache containers and modifying the cromwell.conf
so that images are fetched from quay.io only on the node running Cromwell and not from the compute nodes.
Option 2 is probably the most straightforward if you plan on a low volume of samples. miniwdl + miniwdl-slurm will cache singularity containers once on the node running miniwdl and will not check the container registry again unless the WDL workflow code changes.
Question 2:
humanwgs.container_registry
is the root URL for the registry where these containers are hosted, not the full image address. For instance, the default address is quay.io/pacbio
because all of our images are in the pacbio
organization on quay.io.
Please email support@pacb.com with any additional questions.
Question 1: Do we need to ensure that HPC can connect to the network when running this process on HPC? Because I see that in the input parameters, the container_registry field will read and access Quay.io. If HPC is not connected to the Internet, it will not be accessible, right? Question 2: If the image wants to use our own Docker image, is container-registry in the parameter filled in with the Docker image address?