Closed xxYaaoo closed 4 months ago
Dear Professor Cristian,
Update my error. The output dir finally contained two folders [SV_search and Repeat_Filting], but my task still failed. Any suggestion to solve this problem?~
Very thankful~!
Hi
May I see your nextflow.config
?
Also, I pushed a commit that may fix the error in the tsd_report step. Can you please pull the latest version and try again?
Hi~
This is my nextflow.config file. My latest try did not change the content of config file and met the error I showed above. Sure! I will pull the latest version and try again!
Thank you so much!
I see you were on an older version of the config.
Also try this nextflow.config:
manifest.defaultBranch = 'main'
singularity.enabled = true
singularity.autoMounts = true
singularity.runOptions = '--contain --bind $(pwd):/tmp'
profiles {
standard {
process.executor = 'local'
process.container = 'library://cgroza/collection/graffite:latest'
}
cluster {
process.executor = 'slurm'
process.container = 'library://cgroza/collection/graffite:latest'
process.scratch = '$SLURM_TMPDIR'
}
cloud {
process.executor = 'aws'
process.container = 'library://cgroza/collection/graffite:latest'
}
}
OK, thank you for your help! !
Do I need to add the 'export NXF_TEMP=
I don't touch that variable when I run nextflow on my cluster. However, it may be different for you. Try without first.
Ok, really appreciate your help!
Hi @xxYaaoo, do you still have problem with this issue? Let us know if you need further assistance!
Hi @cgroza and @clemgoub, I've been struggling with the same tmp dir problem. I checked the issues #8, #12, #31, #24 and the "important-note" but couldn't figure out how to solve it. Here is the command I'm using to run GraffiTE on our slurm cluster.
nextflow run /lisc/scratch/botany/amin/te_detection/pME/GraffiTE/main.nf \
--vcf /lisc/scratch/botany/amin/te_detection/pME/test_run/results/1_SV_search/svim-asm_variants.vcf \
--reference input/vieillardii1167c.asm.bp.p_ctg.fa \
--TE_library input/vieillardii.fasta.mod.EDTA.TElib.fa \
--out results \
--genotype false \
-profile cluster \
-with-report reports/report_${SLURM_JOB_ID}.html \
-resume
I used --vcf
instead of --assemblies
as @clemgoub explained here.
And here is the nextflow.config
file
manifest.defaultBranch = 'main'
singularity.enabled = true
singularity.autoMounts = true
singularity.runOptions = '--contain --bind /lisc/scratch/botany/amin/te_detection/pME/test_run/temp_dir:/tmp'
profiles {
standard {
process.executor = 'local'
process.container = '/lisc/scratch/botany/amin/te_detection/pME/graffite_latest.sif'
}
cluster {
process.executor = 'slurm'
process.container = '/lisc/scratch/botany/amin/te_detection/pME/graffite_latest.sif'
process.scratch = '$SLURM_TMPDIR'
}
cloud {
process.executor = 'aws'
process.container = '/lisc/scratch/botany/amin/te_detection/pME/graffite_latest.sif'
}
}
temp_dir
is writable and is used by the repeatmask_VCF
process (I checked this while this job was running; there were a lot of tmp file in it). temp_dir
has currently three empty subdirectories: nxf.j8zh7vIZHc
, slurm-2228294
and slurm-2297514
. The last one is the one that repeatmask_VCF
process used.
I ran the pipeline with changing process.scratch = '$SLURM_TMPDIR'
to process.scratch = '/lisc/scratch/botany/amin/te_detection/pME/test_run/temp_dir'
in the nextflow.config
file but I got the exact same error. Also singularity.runOptions = '--contain --bind $(pwd):/tmp'
did not help, either!
The pipeline stops running about half an hour after submittingtsd_prep
process and doesn't generate the 3_TSD_search
directory.
These are the last lines in the .nextflow.log
file
~> TaskHandler[jobId: 2297514; id: 1; name: repeatmask_VCF (1); status: RUNNING; exit: -; error: -; workDir: /lisc/scratch/botany/amin/te_detection/pME/test_run/work/50/98452e410be717b0d27a72b3705134 started: 1720609826603; exited: -; ]
Jul-10 21:17:29.919 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 2297514; id: 1; name: repeatmask_VCF (1); status: COMPLETED; exit: 0; error: -; workDir: /lisc/scratch/botany/amin/te_detection/pME/test_run/work/50/98452e410be717b0d27a72b3705134 started: 1720609826603; exited: 2024-07-10T19:17:28Z; ]
Jul-10 21:17:29.927 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'TaskFinalizer' minSize=10; maxSize=10; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jul-10 21:17:30.511 [TaskFinalizer-1] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'PublishDir' minSize=10; maxSize=10; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
Jul-10 21:17:31.302 [Task submitter] DEBUG nextflow.executor.GridTaskHandler - [SLURM] submitted process tsd_prep (1) > jobId: 2299973; workDir: /lisc/scratch/botany/amin/te_detection/pME/test_run/work/3d/2b7f19fd8b29d774c934fbaa358251
Jul-10 21:17:31.304 [Task submitter] INFO nextflow.Session - [3d/2b7f19] Submitted process > tsd_prep (1)
Jul-10 21:18:04.894 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 2299973; id: 2; name: tsd_prep (1); status: COMPLETED; exit: 0; error: -; workDir: /lisc/scratch/botany/amin/te_detection/pME/test_run/work/3d/2b7f19fd8b29d774c934fbaa358251 started: 1720639059895; exited: 2024-07-10T19:18:01Z; ]
Jul-10 21:18:05.002 [main] DEBUG nextflow.Session - Session await > all processes finished
Jul-10 21:18:09.889 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: slurm) - terminating tasks monitor poll loop
Jul-10 21:18:09.891 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jul-10 21:18:09.908 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'TaskFinalizer' shutdown completed (hard=false)
Jul-10 21:18:09.925 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'PublishDir' shutdown completed (hard=false)
Jul-10 21:18:09.977 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=2; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=5d 9h 51m 38s; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=16; peakMemory=20 GB; ]
Jul-10 21:18:09.979 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report
Jul-10 21:18:19.223 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Jul-10 21:18:19.489 [main] DEBUG nextflow.util.ThreadPoolManager - Thread pool 'FileTransfer' shutdown completed (hard=false)
Jul-10 21:18:19.510 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
Here is the .command.log
in the /work/3d
##LiSC job info: the temporary directory of your job is also available read-only until 3 days after job end on the login nodes (login01/login02) under this path: /lisc/slurm/node-b07/tmp/slurm-2299973
##LiSC job info: Temporary folders of finished jobs are offline when their compute node went into power-saving sleep. For access to these folders, please contact the helpdesk.
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
INFO: gocryptfs not found, will not be able to use gocryptfs
extracting flanking...
sort: cannot create temporary file in '/tmp/slurm-2299973': No such file or directory
index file vieillardii1167c.asm.bp.p_ctg.fa.fai not found, generating...
extracting SVs' 5' and 3' ends...
sort: cannot create temporary file in '/tmp/slurm-2299973': No such file or directory
I would be grateful if you could help me fix this issue.
Hello @amnghn ! I'm really sorry you are stuck with this mktemp
error.
I'm looking forward to hear about @cgroza opinion. Do you have an empty VCF after the RepeatMasker process? Often mktemp
will fail at this stage but the pipeline keeps going until the TSD process, and then crashes.
Could you send us the complete .command.log
and .command.err
for the RepeatMasker and TSD processes?
Meanwhile, have you tried to run with the standard Nextflow profile? Since the main task of your job is RepeatMasker, this shouldn't affect the speed much.
Also, if you haven't, I'd try to see with your system admins if the process.scratch =
variable carries over to the node the process is dispatched. Perhaps it is interpreted on the shell/node where you run the main command, but not on the shell/node that runs the process.
Thanks,
Clément
Hi @clemgoub,
Thanks a lot for your reply. I finally managed to fix this issue by changing process.scratch = '$SLURM_TMPDIR'
to process.scratch = '$TMPDIR'
. In our cluster, SLURM_ should be omitted. I'm very glad that I got the final GraffiTE.merged.genotypes.vcf.gz
and all the individual VCF files.
The VCF file generated by repeatmasker was not empty even when I had issues with the TSD processes.
Thanks a lot for developing this great pipeline. This was a test run (3 species, 48 samples), I'm planning to run it on 370 individuals of ca. 30 species
Amazing! Thanks a lot for your kind word and sharing your solution! I'm sure it'll help more users as well!
Cheers,
Clément
Hi~
Recently, I'm struggling with the problem that I have to set my own tmp directory while running the GraffiTE, because of the limited access authority in group server. I used the command line ‘export NXF_TEMP=’ in my slurm script to set the tmp dir. However, the squeue showed that my task job was in normally running state, but the output dir contained nothing. I also tried the way you mentioned in the ‘important note' to revise the nextflow.config, but the slurm task showed error as the moment I sbatched my work. Any idea could figure my problem out?
Thank you so much!