epi2me-labs / wf-alignment

Other
23 stars 16 forks source link

Running Nextflow Workflow on Compute Node Without Internet Accessng to the internet #24

Closed YIGUIz closed 7 months ago

YIGUIz commented 7 months ago

Ask away!

I am writing to seek your expertise and guidance regarding a challenge I am facing while running a Nextflow workflow on a compute node that lacks internet access.

I have successfully tested the workflow on a login node with internet connectivity. However, upon submitting the job to the compute node, I encountered a "Could not resolve host" error, specifically related to www.nextflow.io.

SamStudio8 commented 7 months ago

@YIGUIz Can you provide some additional information and logs? It's not immediately clear to me why your job would want to access www.nextflow.io unless Nextflow has not been installed to the compute node already.

YIGUIz commented 7 months ago

The script:

!/bin/bash

SBATCH -J test_alignment

SBATCH -N 1

SBATCH -n 1

SBATCH --ntasks-per-node=12

SBATCH --mem=40GB

SBATCH -o %j.output

SBATCH -e %j.error

activate environment

source activate /public/home/b20223040336/anaconda3/envs/rna

module load

module load apps/R/4.1.0 module load python/3.11.6-gcc-13.2.0-bzuiljs module load apps/singularity/3.11.5

Setting parameters

software="/public/home/b20223040336/Workspace/long_read_rna/00bin" Workspace="/public/home/b20223040336/Workspace/long_read_rna/03Results" data="/public/home/b20223040336/Workspace/long_read_rna/01data"

Main

01alignment

cd ${Workspace}/00alignment nextflow run ${software}/wf-alignment-master --fastq ${data}/fastq --references ${data}/references -profile singularity -resume

The output log: N E X T F L O W ~ version 23.10.1 Launching /public/home/b20223040336/Workspace/long_read_rna/00bin/wf-alignment-master/main.nf [tiny_goldberg] DSL2 - revision: 36df888418 ERROR ~ Unable to acquire lock on session with ID 3a4a395e-15d5-4be4-b7ff-896ce18d0759

Common reasons for this error are:

You can see which process is holding the lock file by using the following command:

SamStudio8 commented 7 months ago

Thanks @YIGUIz, this seems to be a different error from the one you've first reported. I'd suggest removing -resume from your nextflow run command.

YIGUIz commented 7 months ago

Thank you for your suggestion. I will try it again, and then share the results with you.

YIGUIz commented 7 months ago

The output log: I think that maybe caused by the Internet

[- ] process > fastcat - [- ] process > move_or_compress_fq_file - [- ] process > pipeline:getParams - [- ] process > pipeline:getVersions - [- ] process > pipeline:process_references... - [- ] process > pipeline:process_references... - [- ] process > pipeline:makeMMIndex - [- ] process > pipeline:alignReads - [- ] process > pipeline:indexBam - [- ] process > pipeline:bamstats - [- ] process > pipeline:addStepsColumn - [- ] process > pipeline:readDepthPerRef - [- ] process > pipeline:makeReport - [- ] process > configure_jbrowse - [- ] process > output - Pulling Singularity image docker://ontresearch/wf-alignment:shaa9faef16822c5aa48366a4c45b401c9233a6c0f7 [cache /public/home/b20223040336/Workspace/long_read_rna/03Results/00alignment/work/singularity/ontresearch-wf-alignment-shaa9faef16822c5aa48366a4c45b401c9233a6c0f7.img] Pulling Singularity image docker://ontresearch/wf-common:sha1c5febff9f75143710826498b093d9769a5edbb9 [cache /public/home/b20223040336/Workspace/long_read_rna/03Results/00alignment/work/singularity/ontresearch-wf-common-sha1c5febff9f75143710826498b093d9769a5edbb9.img] WARN: Singularity cache directory has not been defined -- Remote image will be stored in the path: /public/home/b20223040336/Workspace/long_read_rna/03Results/00alignment/work/singularity -- Use the environment variable NXF_SINGULARITY_CACHEDIR to specify a different location ERROR ~ Error executing process > 'fastcat (1)'

Caused by: Failed to pull singularity image command: singularity pull --name ontresearch-wf-common-sha1c5febff9f75143710826498b093d9769a5edbb9.img.pulling.1709120220012 docker://ontresearch/wf-common:sha1c5febff9f75143710826498b093d9769a5edbb9 > /dev/null status : 255 message: FATAL: While making image from oci registry: error fetching image to cache: failed to get checksum for docker://ontresearch/wf-common:sha1c5febff9f75143710826498b093d9769a5edbb9: pinging container registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:52951->[::1]:53: read: connection refused

-- Check '.nextflow.log' file for details

SamStudio8 commented 7 months ago

OK, this makes more sense. The workflow will try and download the Docker images needed for Singularity to run. We don't yet have an easy way to automatically download all the Docker images a workflow needs to work offline but if you choose a directory on your filesystem to save images to, you can manually pull those images yourself on the head node that has internet access. You can then tell Nextflow where to look for those images.

For the above image (note I have removed the .pulling.XXXXXXX file extension):

mkdir /public/home/b20223040336/singularity_images
cd /public/home/b20223040336/singularity_images
singularity pull --name ontresearch-wf-common-sha1c5febff9f75143710826498b093d9769a5edbb9.img docker://ontresearch/wf-common:sha1c5febff9f75143710826498b093d9769a5edbb9

To tell Nextflow where the images are, you will need the following line in your job submission script:

export NXF_SINGULARITY_CACHEDIR=/public/home/b20223040336/singularity_images

You can inspect the nextflow.config for the other images that will be needed, but perhaps the easiest thing to do is run the workflow a few times and repeat the pull process for each image that yields an error.

YIGUIz commented 7 months ago

Thank you very much! It works successfully! @SamStudio8