Closed qizhengyang2017 closed 1 day ago
Please see
It means your HPC cluster isn't configured for singularity properly. This error has nothing to do with GraffiTE or nextflow.
-------- Original Message -------- On 7/26/24 9:45 AM, qizhengyang2017 wrote:
Hello,
I ran the test data in the login node of HPC cluster:
nextflow run /public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/main.nf \ --assemblies assemblies.csv \ --TE_library human_DFAM3.6.fasta \ --reference hs37d5.chr22.fa \ --reads reads.csv -with-singularity /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif
It gave me an error in the first step map_asm: FATAL: while extracting /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: ERROR : Failed to create user namespace: user namespace disabled : exit status 1
What can I do to solve the error?
N E X T F L O W ~ version 24.04.3
Launching
/public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/main.nf
[compassionate_hamilton] DSL2 - revision: 6d6ae7414d▄████ ██▀███ ▄▄▄ █████▒ █████▒██▓▄▄▄█████▓▓█████ ██▒ ▀█▒▓██ ▒ ██▒▒████▄ ▓██ ▒▓██ ██▒ ▓▒▓█ ▀ ▒██░▄▄▄░▓██ ░▄█ ▒▒██ ▀█▄ ▒████ ░▒████ ░▒██▒▒ ▓██░ ▒░▒███ ░▓█ ██▓▒██▀▀█▄ ░██▄▄▄▄██ ░▓█▒ ░░▓█▒ ░░██░░ ▓██▓ ░ ▒▓█ ▄ ░▒▓███▀▒░██▓ ▒██▒ █ ▓██▒░▒█░ ░▒█░ ░██░ ▒██▒ ░ ░▒████▒ ░▒ ▒ ░ ▒▓ ░▒▓░ ▒▒ ▓▒█░ ▒ ░ ▒ ░ ░▓ ▒ ░░ ░░ ▒░ ░ ░ ░ ░▒ ░ ▒░ ▒ ▒▒ ░ ░ ░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ░ ▒ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░
V . null
Find and Genotype Transposable Elements Insertion Polymorphisms in Genome Assemblies using a Pangenomic Approach
Authors: Cristian Groza and Clément Goubert Bug/issues: https://github.com/cgroza/GraffiTE/issues
executor > local (1) executor > local (1) [d9/e7bbb9] map_asm (1) [100%] 1 of 1, failed: 1 ✘ [- ] svim_asm - [- ] survivor_merge - [- ] repeatmask_VCF - [- ] tsd_prep - [- ] tsd_search - [- ] tsd_report - [- ] pangenie - [- ] merge_VCFs - ERROR ~ Error executing process > 'map_asm (1)'
Caused by: Process
map_asm (1)
terminated with an error exit status (255)Command executed:
minimap2 -a -x asm5 --cs -r2k -t 1 -K 500M hs37d5.chr22.fa HG002.mat.cur.20211005_chr22.fasta.gz | samtools sort -m4G @.*** -o asm.sorted.bam -
Command exit status: 255
Command output: (empty)
Command error: INFO: Converting SIF file to temporary sandbox... FATAL: while extracting /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: ERROR : Failed to create user namespace: user namespace disabled : exit status 1
Work dir: /public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/test/GraffiTE_testset/work/d9/e7bbb9304f96f7914424bc1d3e9d97
Tip: you can replicate the issue by changing to the process work dir and entering the command
bash .command.run
-- Check '.nextflow.log' file for details
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for your reply.
I move the data to a server. I can ran successfully with the test data. But when I ran the real data, it failed. The command I used is
nextflow run /home/zhengqingyou/GraffiTE/GraffiTE/main.nf \
--assemblies assemblies.csv \
--TE_library Gh-families.fa \
--reference TM1.fa.gz \
--genotype false -with-singularity /home/zhengqingyou/GraffiTE/graffite_latest.sif
The error messages:
Command error:
INFO: Converting SIF file to temporary sandbox...
FATAL: while extracting /home/zhengqingyou/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: WARNING: passwd file doesn't >
WARNING: group file doesn't exist in container, not updating
WARNING: Skipping mount /etc/hosts [binds]: /etc/hosts doesn't exist in container
WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
WARNING: Skipping mount proc [kernel]: /proc doesn't exist in container
WARNING: Skipping mount /home/zhengqingyou/micromamba/envs/singularity/var/singularity/mnt/session/tmp [tmp]: /tmp doesn't exist in container
WARNING: Skipping mount /home/zhengqingyou/micromamba/envs/singularity/var/singularity/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
WARNING: Skipping mount /home/zhengqingyou/micromamba/envs/singularity/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in conta>
Parallel unsquashfs: Using 104 processors
44474 inodes (103641 blocks) to write
I find a similar issue to yours here: https://groups.google.com/a/lbl.gov/g/singularity/c/D0TA3H5jNw0
Hello,
I have asked my server administrator to install the latest singularity (v 4.1.4). The previous error appears to have been resolved, but a new error has occurred: "samtools sort: failed to read header from "-"."
executor > local (35)
[bc/c9c14a] map_asm (33) [ 14%] 1 of 7, failed: 1
[- ] svim_asm -
[- ] survivor_merge -
[- ] repeatmask_VCF -
[- ] tsd_prep -
[- ] tsd_search -
[- ] tsd_report -
ERROR ~ Error executing process > 'map_asm (19)'
Caused by:
Process `map_asm (19)` terminated with an error exit status (1)
Command executed:
minimap2 -a -x asm5 --cs -r2k -t 1 -K 500M TM1.fa.gz TW077.fa.gz | samtools sort -m4G -@4 -o asm.sorted.bam -
Command exit status:
1
Command output:
(empty)
Command error:
[M::mm_idx_gen::72.953*1.00] collected minimizers
[M::mm_idx_gen::91.895*1.00] sorted minimizers
[M::main::91.895*1.00] loaded/built the index for 26 target sequence(s)
[M::mm_mapopt_update::93.539*1.00] mid_occ = 423
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 26
[M::mm_idx_stat::94.464*1.00] distinct minimizers: 99257769 (75.70% are singletons); average occurrences: 2.191; average spacing: 10.275; total length: 2233991669
[W::hts_set_opt] Cannot change block size for this format
samtools sort: failed to read header from "-"
Work dir:
/home/zhengqingyou/GraffiTE/GraffiTE/cotton/work/a4/169f8498cc1566383f03f1f1f167de
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
This happens when minimap2 runs out of memory. You need to make sure each step has enough resources, which is described in the Readme file.
-------- Original Message -------- On 7/26/24 11:25 PM, qizhengyang2017 wrote:
Hello,
I have asked my server administrator to install the latest singularity (v 4.1.4). The previous error appears to have been resolved, but a new error has occurred: "samtools sort: failed to read header from "-"."
executor > local (35) [bc/c9c14a] map_asm (33) [ 14%] 1 of 7, failed: 1 [- ] svim_asm - [- ] survivor_merge - [- ] repeatmask_VCF - [- ] tsd_prep - [- ] tsd_search - [- ] tsd_report - ERROR ~ Error executing process > 'map_asm (19)'
Caused by: Process
map_asm (19)
terminated with an error exit status (1)Command executed:
minimap2 -a -x asm5 --cs -r2k -t 1 -K 500M TM1.fa.gz TW077.fa.gz | samtools sort -m4G @.*** -o asm.sorted.bam -
Command exit status: 1
Command output: (empty)
Command error: [M::mm_idx_gen::72.9531.00] collected minimizers [M::mm_idx_gen::91.8951.00] sorted minimizers [M::main::91.8951.00] loaded/built the index for 26 target sequence(s) [M::mm_mapopt_update::93.5391.00] mid_occ = 423 [M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 26 [M::mm_idx_stat::94.464*1.00] distinct minimizers: 99257769 (75.70% are singletons); average occurrences: 2.191; average spacing: 10.275; total length: 2233991669 [W::hts_set_opt] Cannot change block size for this format samtools sort: failed to read header from "-"
Work dir: /home/zhengqingyou/GraffiTE/GraffiTE/cotton/work/a4/169f8498cc1566383f03f1f1f167de
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named
.command.sh
-- Check '.nextflow.log' file for details
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Hello, I add params in my command.
nextflow run /home/zhengqingyou/GraffiTE/GraffiTE/main.nf \
--assemblies assemblies.csv \
--TE_library Gh-families.fa \
--reference TM1.fa.gz \
--genotype false -with-singularity /home/zhengqingyou/GraffiTE/graffite_latest.sif \
--params.map_asm_memory '30G'
I am not familiar with nextflow
. Do you know if my command is right?
I ran it on the local, but I noticed the params.map_asm_memory
is a cluster mode parameter, So maybe it won't work?
Remove "params." from "params.map_asm_memory". Also 30G might not be enough depending on the species you are working with. But your command looks almost right.
-------- Original Message -------- On 7/29/24 11:53 PM, qizhengyang2017 wrote:
Hello, I add params in my command.
nextflow run /home/zhengqingyou/GraffiTE/GraffiTE/main.nf \ --assemblies assemblies.csv \ --TE_library Gh-families.fa \ --reference TM1.fa.gz \ --genotype
false
-with-singularity /home/zhengqingyou/GraffiTE/graffite_latest.sif \ --params.map_asm_memory
'
30G
'
I am not familiar with nextflow. Do you know if my command is right? I ran it on the local, but I noticed the params.map_asm_memory is a cluster mode parameter, So maybe it won't work?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Thank you very much! I'll give it a try.
I have another question. I used the Minigraph-Cactus workflow to construct a graph-pangenome and obtained a VCF file containing variant information for my 35 samples. The VCF looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HC04 HC15 HW03 HW05 HW06 HW07 P01 P02 P04 P19 P20 TW007 TW013 TW026 TW029 TW031 TW055 TW064 TW075 TW077 TW091 TW094 TW100 TW134 XJ74 XZ142 ZY006 ZY10 ZY184 ZY236 ZY238 ZY354 ZY381 ZY384 ZY46
chr01 119 >2495>2503 CTA CCC,TTA,CCA 60 . AC=1,1,1;AF=0.0526316,0.0526316,0.0526316;AN=19;AT=>2495>2496>2498>2500>2503,>2495>2496>2499>2502>2503,>2495>2497>2498>2500>2503,>2495>2496>2499>2500>2503;NS=19;LV=0 GT 1 . . . 0 0 . . 2 0 0 0 0 . 0 0 . . . 0 . . 0 . 0 0 0 0 3 0 0 . . ..
I want to extract SVs and annotate them with TE information, but this file is challenging for me to handle because each line contains multiple variants. I could use the longest allele to filter SVs, but it's still difficult for me to generate the indels.fa file needed for your workflow. Do you have any suggestions for SV filtering and TE annotation? Thank you very much!
We already ran a similar file and had to make several modifications. First, run vcfbub (https://github.com/pangenome/vcfbub) to pop multi-allellic sites into a a top-level allele. Then rename the ID field to have unique names that do not include the ">" character. The ">" character is a special character in FASTA files denoting the contig names, so it must not be in the VCF ID field.
Then, you can pass the resulting VCF file to GraffiTE using the --vcf
parameter.
Thank you very much for your detailed instructions! I greatly appreciate your help!
Hello, I tried to use GraffiTE to annotate the VCF file generated by Minigrap-Cactus . The VCF I used is the one that has gone through vcfbub to remove nested sites, as well as those greater than 100kb. I have changed the ID column to remove ">". Bellow is the command I used.
nextflow run ~/pan-TE-analysis/GraffiTE/GraffiTE/main.nf \
-profile cluster \
--TE_library Gh-families.fa \
--genotype false \
--reference TM1.fa.gz \
--vcf cotton-pg.TM1.uniqID.vcf.gz
The error message is
The name D13_63029835 is used more than once in the fasta file. Second and later occurrences are appended a _number
The name D13_63305987 is used more than once in the fasta file. Second and later occurrences are appended a _number
Work dir:
/public/home/zyqi/pan-TE-analysis/version2/TE_annot/work/59/e0106b8fa214f54cb5bb6b0f674fee
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
I checked the .nextflow.log
. Bellow is the error message:
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Aug-10 05:50:39.234 [main] DEBUG nextflow.Session - Session await > all processes finished
Aug-10 05:50:39.246 [TaskFinalizer-1] DEBUG nextflow.Session - Session aborted -- Cause: Process `repeatmask_VCF (1)` terminated with an error exit status (140)
Aug-10 05:50:39.396 [main] DEBUG nextflow.Session - Session await > all barriers passed
Aug-10 05:50:39.396 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: lsf) - terminating tasks monitor poll loop
Aug-10 05:50:39.413 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=12h 2s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
Aug-10 05:50:39.565 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Aug-10 05:50:39.629 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
The sequences in indels.fa
are like this. So I think the error maybe caused by the indels.fa
, because the sequences contain ,
. What do I need to do with the VCF, so that it can be used for GraffiTE?
>A01_55
CCCAC,AGATA,CTAAA,CTAAACCCTAAACCCAAAA,CTAAACCCATAAACCCTAAAA,CTAAACCCTAAACCCTAAACCCCTAAA,CCTAAAACCCTAAAACCCTAAA,CTAAAACCCTAAACCCTAACCCAAAA,CTAAAACCCTAAACCCTAAACCCCTAAA
>A01_87
CCC,CA,AAC,AAA
>A01_115
CC,ACA,CA
>A01_147
C,CAA,CCTAA
>A01_162
CCC,CTA
Your VCF must biallellic. You must run bcftools norm -m- file.vcf to make it so. It will make the commas go away. Then you must make sure the variant IDs are unique and that they are valid contig fasta names.
-------- Original Message -------- On 8/10/24 11:11 AM, qizhengyang2017 wrote:
Hello, I tried to use GraffiTE to annotate the VCF file generated by Minigrap-Cactus . The VCF I used is the one that has gone through vcfbub to remove nested sites, as well as those greater than 100kb. I have changed the ID column to remove ">". Bellow is the command I used.
nextflow run
~
/pan-TE-analysis/GraffiTE/GraffiTE/main.nf \ -profile cluster \ --TE_library Gh-families.fa \ --genotype
false
\ --reference TM1.fa.gz \ --vcf cotton-pg.TM1.uniqID.vcf.gz
The error message is
The name D13_63029835 is used more than once in the fasta file. Second and later occurrences are appended a _number The name D13_63305987 is used more than once in the fasta file. Second and later occurrences are appended a _number
Work dir: /public/home/zyqi/pan-TE-analysis/version2/TE_annot/work/59/e0106b8fa214f54cb5bb6b0f674fee
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named
.command.sh
-- Check '.nextflow.log' file for details
I checked the .nextflow.log. Bellow is the error message:
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named
.command.sh
Aug-10 05:50:39.234 [main] DEBUG nextflow.Session - Session await > all processes finished Aug-10 05:50:39.246 [TaskFinalizer-1] DEBUG nextflow.Session - Session aborted -- Cause: Processrepeatmask_VCF (1)
terminated with an error exit status (140) Aug-10 05:50:39.396 [main] DEBUG nextflow.Session - Session await > all barriers passed Aug-10 05:50:39.396 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: lsf) - terminating tasks monitor poll loop Aug-10 05:50:39.413 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=12h 2s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ] Aug-10 05:50:39.565 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done Aug-10 05:50:39.629 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- GoodbyeThe sequences in indels.fa are like this. So I think the error maybe caused by the indels.fa, because the sequences contain ,. What do I need to do with the VCF, so that it can be used for GraffiTE?
A01_55 CCCAC,AGATA,CTAAA,CTAAACCCTAAACCCAAAA,CTAAACCCATAAACCCTAAAA,CTAAACCCTAAACCCTAAACCCCTAAA,CCTAAAACCCTAAAACCCTAAA,CTAAAACCCTAAACCCTAACCCAAAA,CTAAAACCCTAAACCCTAAACCCCTAAA A01_87 CCC,CA,AAC,AAA A01_115 CC,ACA,CA A01_147 C,CAA,CCTAA A01_162 CCC,CTA
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
We added this notes to the README file for future users. Thanks for highlighting this.
Hello,
I ran the test data in the login node of HPC cluster:
It gave me an error in the first step
map_asm
: FATAL: while extracting /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: ERROR : Failed to create user namespace: user namespace disabled : exit status 1What can I do to solve the error?