Open kubu4 opened 2 years ago
Thanks for opening this issue! There seems to be something going on with the "Command executed:" section in the error message. Specifically here:
-o fastq/merge.null \
-p fastq/merge.g null g \
where "null" should reflect the reads variable from L48-L49 in wgbs.nf
-o fastq/${params.merge ? "${readtype}." : ""}${reads[0]} \\
-p fastq/${params.merge ? "${readtype}." : ""}${reads[1]} ${reads} \\
I suspect the issue here is that we need to create a new test.config
file for running the test profile offline. Can you provide some more information as to what you did here, exactly? Did you modify the paths in the existing test.config
file?
As an aside to this issue, I just wanted to point out that during a typical pipeline run it is not necessary to have an open internet connection. If your intention is to submit to a queuing system, for example, which perhaps sends the job to another node where there is no internet connection, it should be enough to have already pulled the pipeline normally from the login node. You will get a local copy of the pipeline in ~/.nextflow/assets
which is the first place nextflow will look for the pipeline whenever you run it.
Is that relevant for your use case at all?
Thanks for looking into this. It is much appreciated!
Did you modify the paths in the existing test.config file?
Gah! Yes! Sorry for not including that!! Here's what the modified test.config
file looks like:
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run epidivere/wgbs -profile test
*/
params {
// enable all steps
input = "test profile"
merge = true
INDEX = true
trim = true
fastqc = true
unique = true
// genome reference
reference = "/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/genome/genome.fa"
// set readPaths parameter (only available in test profile)
readPaths = [
['sampleA', 'input', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_2.fastq.gz'],
['sampleB', 'input', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_2.fastq.gz']
]
// set mergePaths parameter (only available in test profile)
mergePaths = [
['sampleA', 'merge', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/merge/sampleA_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/merge/sampleA_2.fastq.gz'],
['sampleB', 'merge', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/merge/sampleB_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/merge/sampleB_2.fastq.gz']
]
}
As an aside to this issue, I just wanted to point out that during a typical pipeline run it is not necessary to have an open internet connection. If your intention is to submit to a queuing system, for example, which perhaps sends the job to another node where there is no internet connection, it should be enough to have already pulled the pipeline normally from the login node. You will get a local copy of the pipeline in ~/.nextflow/assets which is the first place nextflow will look for the pipeline whenever you run it.
Is that relevant for your use case at all?
Yeah, we'd be running on a high performance computing cluster (uses SLURM job manager). Was just trying to confirm that the install and using Singularity on the computing nodes would work properly. Figured troubleshooting would be easier if test ran successfully.
In this new test.config file for running offline, it looks like you've lost the nested tuples in both readPaths
and mergePaths
.
So for example this:
readPaths = [
['sampleA', 'input', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_2.fastq.gz'],
['sampleB', 'input', '/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_2.fastq.gz']
]
should be changed to this:
readPaths = [
['sampleA', 'input', ['/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleA_2.fastq.gz']],
['sampleB', 'input', ['/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_1.fastq.gz','/gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/sampleB_2.fastq.gz']]
]
By the way, I am very happy to assist you in writing a configuration profile for running your nextflow pipelines with SLURM. Nextflow is able to integrate very nicely with such resource management software, where it can automatically submit each process as a job in your queue system for example. Please feel free to post a new issue requesting help with this and I will try to tailor it for your system as best as I can!
I'm trying to run the pipeline test via the Singularity image on our university's computing cluster, which doesn't have internet access when executing jobs.
I've downloaded all the of the input files listed in
test.config
. I've also downloaded the Singularity image (singularity pull docker://epidiverse/wgbs:1.0
) and changed thenextflow.config
file to specify the Singularity image location, like so:That seemed like that should be all that was needed, but when I execute the test command (
NXF_VER=20.07.1 /gscratch/srlab/programs/nextflow-21.10.6-all run /gscratch/srlab/sam/analyses/20220710-olu-epidiverse_wgbs-test/wgbs-1.0 -profile test,singularity
), it fails with this error:When I look at the Cutadapt log file, this is what is shown:
Did I miss something that needs to be setup for a local install to run properly?