cgroza / GraffiTE

GraffiTE is a pipeline that finds polymorphic transposable elements in genome assemblies and/or long reads, and genotypes the discovered polymorphisms in read sets using genome-graphs.
Other
106 stars 4 forks source link

ERROR : Failed to create user namespace: user namespace disabled #35

Open qizhengyang2017 opened 1 month ago

qizhengyang2017 commented 1 month ago

Hello,

I ran the test data in the login node of HPC cluster:

nextflow run /public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/main.nf \
   --assemblies assemblies.csv \
   --TE_library human_DFAM3.6.fasta \
   --reference hs37d5.chr22.fa \
    --reads reads.csv -with-singularity /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif

It gave me an error in the first step map_asm: FATAL: while extracting /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: ERROR : Failed to create user namespace: user namespace disabled : exit status 1

What can I do to solve the error?

N E X T F L O W   ~  version 24.04.3

Launching `/public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/main.nf` [compassionate_hamilton] DSL2 - revision: 6d6ae7414d

▄████  ██▀███   ▄▄▄        █████▒ █████▒██▓▄▄▄█████▓▓█████
██▒ ▀█▒▓██ ▒ ██▒▒████▄    ▓██   ▒▓██           ██▒ ▓▒▓█   ▀
▒██░▄▄▄░▓██ ░▄█ ▒▒██  ▀█▄  ▒████ ░▒████ ░▒██▒▒ ▓██░ ▒░▒███
░▓█  ██▓▒██▀▀█▄  ░██▄▄▄▄██ ░▓█▒  ░░▓█▒  ░░██░░ ▓██▓ ░ ▒▓█  ▄
░▒▓███▀▒░██▓ ▒██▒  █   ▓██▒░▒█░   ░▒█░   ░██░  ▒██▒ ░ ░▒████▒
░▒   ▒ ░ ▒▓ ░▒▓░ ▒▒   ▓▒█░ ▒ ░    ▒ ░   ░▓    ▒ ░░   ░░ ▒░ ░
░   ░   ░▒ ░ ▒░  ▒   ▒▒ ░ ░      ░      ▒ ░    ░     ░ ░  ░
░ ░   ░   ░░   ░   ░   ▒    ░ ░    ░ ░    ▒ ░  ░         ░
░    ░           ░  ░               ░              ░  ░

V . null

Find and Genotype Transposable Elements Insertion Polymorphisms
in Genome Assemblies using a Pangenomic Approach

Authors: Cristian Groza and Clément Goubert
Bug/issues: https://github.com/cgroza/GraffiTE/issues

executor >  local (1)
executor >  local (1)
[d9/e7bbb9] map_asm (1)    [100%] 1 of 1, failed: 1 ✘
[-        ] svim_asm       -
[-        ] survivor_merge -
[-        ] repeatmask_VCF -
[-        ] tsd_prep       -
[-        ] tsd_search     -
[-        ] tsd_report     -
[-        ] pangenie       -
[-        ] merge_VCFs     -
ERROR ~ Error executing process > 'map_asm (1)'

Caused by:
  Process `map_asm (1)` terminated with an error exit status (255)

Command executed:

  minimap2 -a -x asm5 --cs -r2k -t 1 -K 500M hs37d5.chr22.fa HG002.mat.cur.20211005_chr22.fasta.gz | samtools sort -m4G -@4 -o asm.sorted.bam -

Command exit status:
  255

Command output:
  (empty)

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  FATAL:   while extracting /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: ERROR  : Failed to create user namespace: user namespace disabled
  : exit status 1

Work dir:
  /public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/test/GraffiTE_testset/work/d9/e7bbb9304f96f7914424bc1d3e9d97

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
cgroza commented 1 month ago

Please see

https://stackoverflow.com/questions/73618551/error-failed-to-create-user-namespace-user-namespace-disabled-even-after-dis#:~:text=company%20blog-,ERROR%3A%20Failed%20to%20create%20user%20namespace%3A%20user%20namespace%20disabled%20%2D,conf%20manually&text=After%20looking%20into%20the%20singularity,conf%20file.

It means your HPC cluster isn't configured for singularity properly. This error has nothing to do with GraffiTE or nextflow.

-------- Original Message -------- On 7/26/24 9:45 AM, qizhengyang2017 wrote:

Hello,

I ran the test data in the login node of HPC cluster:

nextflow run /public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/main.nf \ --assemblies assemblies.csv \ --TE_library human_DFAM3.6.fasta \ --reference hs37d5.chr22.fa \ --reads reads.csv -with-singularity /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif

It gave me an error in the first step map_asm: FATAL: while extracting /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: ERROR : Failed to create user namespace: user namespace disabled : exit status 1

What can I do to solve the error?

N E X T F L O W ~ version 24.04.3

Launching /public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/main.nf [compassionate_hamilton] DSL2 - revision: 6d6ae7414d

▄████ ██▀███ ▄▄▄ █████▒ █████▒██▓▄▄▄█████▓▓█████ ██▒ ▀█▒▓██ ▒ ██▒▒████▄ ▓██ ▒▓██ ██▒ ▓▒▓█ ▀ ▒██░▄▄▄░▓██ ░▄█ ▒▒██ ▀█▄ ▒████ ░▒████ ░▒██▒▒ ▓██░ ▒░▒███ ░▓█ ██▓▒██▀▀█▄ ░██▄▄▄▄██ ░▓█▒ ░░▓█▒ ░░██░░ ▓██▓ ░ ▒▓█ ▄ ░▒▓███▀▒░██▓ ▒██▒ █ ▓██▒░▒█░ ░▒█░ ░██░ ▒██▒ ░ ░▒████▒ ░▒ ▒ ░ ▒▓ ░▒▓░ ▒▒ ▓▒█░ ▒ ░ ▒ ░ ░▓ ▒ ░░ ░░ ▒░ ░ ░ ░ ░▒ ░ ▒░ ▒ ▒▒ ░ ░ ░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░░ ░ ░ ▒ ░ ░ ░ ░ ▒ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░

V . null

Find and Genotype Transposable Elements Insertion Polymorphisms in Genome Assemblies using a Pangenomic Approach

Authors: Cristian Groza and Clément Goubert Bug/issues: https://github.com/cgroza/GraffiTE/issues

executor > local (1) executor > local (1) [d9/e7bbb9] map_asm (1) [100%] 1 of 1, failed: 1 ✘ [- ] svim_asm - [- ] survivor_merge - [- ] repeatmask_VCF - [- ] tsd_prep - [- ] tsd_search - [- ] tsd_report - [- ] pangenie - [- ] merge_VCFs - ERROR ~ Error executing process > 'map_asm (1)'

Caused by: Process map_asm (1) terminated with an error exit status (255)

Command executed:

minimap2 -a -x asm5 --cs -r2k -t 1 -K 500M hs37d5.chr22.fa HG002.mat.cur.20211005_chr22.fasta.gz | samtools sort -m4G @.*** -o asm.sorted.bam -

Command exit status: 255

Command output: (empty)

Command error: INFO: Converting SIF file to temporary sandbox... FATAL: while extracting /public/home/zyqi/pan-TE-analysis/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: ERROR : Failed to create user namespace: user namespace disabled : exit status 1

Work dir: /public/home/zyqi/pan-TE-analysis/GraffiTE/GraffiTE/test/GraffiTE_testset/work/d9/e7bbb9304f96f7914424bc1d3e9d97

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

qizhengyang2017 commented 1 month ago

Thank you for your reply.

I move the data to a server. I can ran successfully with the test data. But when I ran the real data, it failed. The command I used is

nextflow run /home/zhengqingyou/GraffiTE/GraffiTE/main.nf \
   --assemblies assemblies.csv \
   --TE_library Gh-families.fa \
   --reference TM1.fa.gz \
   --genotype false -with-singularity /home/zhengqingyou/GraffiTE/graffite_latest.sif

The error messages:

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  FATAL:   while extracting /home/zhengqingyou/GraffiTE/graffite_latest.sif: root filesystem extraction failed: extract command failed: WARNING: passwd file doesn't >
  WARNING: group file doesn't exist in container, not updating
  WARNING: Skipping mount /etc/hosts [binds]: /etc/hosts doesn't exist in container
  WARNING: Skipping mount /etc/localtime [binds]: /etc/localtime doesn't exist in container
  WARNING: Skipping mount proc [kernel]: /proc doesn't exist in container
  WARNING: Skipping mount /home/zhengqingyou/micromamba/envs/singularity/var/singularity/mnt/session/tmp [tmp]: /tmp doesn't exist in container
  WARNING: Skipping mount /home/zhengqingyou/micromamba/envs/singularity/var/singularity/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
  WARNING: Skipping mount /home/zhengqingyou/micromamba/envs/singularity/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in conta>
  Parallel unsquashfs: Using 104 processors
  44474 inodes (103641 blocks) to write
cgroza commented 1 month ago

I find a similar issue to yours here: https://groups.google.com/a/lbl.gov/g/singularity/c/D0TA3H5jNw0

qizhengyang2017 commented 1 month ago

Hello,

I have asked my server administrator to install the latest singularity (v 4.1.4). The previous error appears to have been resolved, but a new error has occurred: "samtools sort: failed to read header from "-"."

executor >  local (35)
[bc/c9c14a] map_asm (33)   [ 14%] 1 of 7, failed: 1
[-        ] svim_asm       -
[-        ] survivor_merge -
[-        ] repeatmask_VCF -
[-        ] tsd_prep       -
[-        ] tsd_search     -
[-        ] tsd_report     -
ERROR ~ Error executing process > 'map_asm (19)'

Caused by:
  Process `map_asm (19)` terminated with an error exit status (1)

Command executed:

  minimap2 -a -x asm5 --cs -r2k -t 1 -K 500M TM1.fa.gz TW077.fa.gz | samtools sort -m4G -@4 -o asm.sorted.bam -

Command exit status:
  1

Command output:
  (empty)

Command error:
  [M::mm_idx_gen::72.953*1.00] collected minimizers
  [M::mm_idx_gen::91.895*1.00] sorted minimizers
  [M::main::91.895*1.00] loaded/built the index for 26 target sequence(s)
  [M::mm_mapopt_update::93.539*1.00] mid_occ = 423
  [M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 26
  [M::mm_idx_stat::94.464*1.00] distinct minimizers: 99257769 (75.70% are singletons); average occurrences: 2.191; average spacing: 10.275; total length: 2233991669
  [W::hts_set_opt] Cannot change block size for this format
  samtools sort: failed to read header from "-"

Work dir:
  /home/zhengqingyou/GraffiTE/GraffiTE/cotton/work/a4/169f8498cc1566383f03f1f1f167de

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
cgroza commented 1 month ago

This happens when minimap2 runs out of memory. You need to make sure each step has enough resources, which is described in the Readme file.

-------- Original Message -------- On 7/26/24 11:25 PM, qizhengyang2017 wrote:

Hello,

I have asked my server administrator to install the latest singularity (v 4.1.4). The previous error appears to have been resolved, but a new error has occurred: "samtools sort: failed to read header from "-"."

executor > local (35) [bc/c9c14a] map_asm (33) [ 14%] 1 of 7, failed: 1 [- ] svim_asm - [- ] survivor_merge - [- ] repeatmask_VCF - [- ] tsd_prep - [- ] tsd_search - [- ] tsd_report - ERROR ~ Error executing process > 'map_asm (19)'

Caused by: Process map_asm (19) terminated with an error exit status (1)

Command executed:

minimap2 -a -x asm5 --cs -r2k -t 1 -K 500M TM1.fa.gz TW077.fa.gz | samtools sort -m4G @.*** -o asm.sorted.bam -

Command exit status: 1

Command output: (empty)

Command error: [M::mm_idx_gen::72.9531.00] collected minimizers [M::mm_idx_gen::91.8951.00] sorted minimizers [M::main::91.8951.00] loaded/built the index for 26 target sequence(s) [M::mm_mapopt_update::93.5391.00] mid_occ = 423 [M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 26 [M::mm_idx_stat::94.464*1.00] distinct minimizers: 99257769 (75.70% are singletons); average occurrences: 2.191; average spacing: 10.275; total length: 2233991669 [W::hts_set_opt] Cannot change block size for this format samtools sort: failed to read header from "-"

Work dir: /home/zhengqingyou/GraffiTE/GraffiTE/cotton/work/a4/169f8498cc1566383f03f1f1f167de

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

qizhengyang2017 commented 1 month ago

Hello, I add params in my command.

nextflow run /home/zhengqingyou/GraffiTE/GraffiTE/main.nf \
   --assemblies assemblies.csv \
   --TE_library Gh-families.fa \
   --reference TM1.fa.gz \
   --genotype false -with-singularity /home/zhengqingyou/GraffiTE/graffite_latest.sif \
   --params.map_asm_memory '30G'

I am not familiar with nextflow. Do you know if my command is right? I ran it on the local, but I noticed the params.map_asm_memory is a cluster mode parameter, So maybe it won't work?

cgroza commented 1 month ago

Remove "params." from "params.map_asm_memory". Also 30G might not be enough depending on the species you are working with. But your command looks almost right.

-------- Original Message -------- On 7/29/24 11:53 PM, qizhengyang2017 wrote:

Hello, I add params in my command.

nextflow run /home/zhengqingyou/GraffiTE/GraffiTE/main.nf \ --assemblies assemblies.csv \ --TE_library Gh-families.fa \ --reference TM1.fa.gz \ --genotype

false

-with-singularity /home/zhengqingyou/GraffiTE/graffite_latest.sif \ --params.map_asm_memory

'

30G

'

I am not familiar with nextflow. Do you know if my command is right? I ran it on the local, but I noticed the params.map_asm_memory is a cluster mode parameter, So maybe it won't work?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

qizhengyang2017 commented 1 month ago

Thank you very much! I'll give it a try.

I have another question. I used the Minigraph-Cactus workflow to construct a graph-pangenome and obtained a VCF file containing variant information for my 35 samples. The VCF looks like this:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  HC04    HC15    HW03    HW05    HW06    HW07    P01 P02 P04 P19 P20 TW007 TW013 TW026   TW029   TW031   TW055   TW064   TW075   TW077   TW091   TW094   TW100   TW134   XJ74    XZ142   ZY006   ZY10    ZY184   ZY236   ZY238   ZY354   ZY381 ZY384 ZY46
chr01   119 >2495>2503  CTA CCC,TTA,CCA 60  .   AC=1,1,1;AF=0.0526316,0.0526316,0.0526316;AN=19;AT=>2495>2496>2498>2500>2503,>2495>2496>2499>2502>2503,>2495>2497>2498>2500>2503,>2495>2496>2499>2500>2503;NS=19;LV=0   GT  1   .   .   .   0   0   .   .   2   0   0   0 0 .   0   0   .   .   .   0   .   .   0   .   0   0   0   0   3   0   0   .   .   ..

I want to extract SVs and annotate them with TE information, but this file is challenging for me to handle because each line contains multiple variants. I could use the longest allele to filter SVs, but it's still difficult for me to generate the indels.fa file needed for your workflow. Do you have any suggestions for SV filtering and TE annotation? Thank you very much!

cgroza commented 1 month ago

We already ran a similar file and had to make several modifications. First, run vcfbub (https://github.com/pangenome/vcfbub) to pop multi-allellic sites into a a top-level allele. Then rename the ID field to have unique names that do not include the ">" character. The ">" character is a special character in FASTA files denoting the contig names, so it must not be in the VCF ID field.

Then, you can pass the resulting VCF file to GraffiTE using the --vcf parameter.

qizhengyang2017 commented 1 month ago

Thank you very much for your detailed instructions! I greatly appreciate your help!

qizhengyang2017 commented 4 weeks ago

Hello, I tried to use GraffiTE to annotate the VCF file generated by Minigrap-Cactus . The VCF I used is the one that has gone through vcfbub to remove nested sites, as well as those greater than 100kb. I have changed the ID column to remove ">". Bellow is the command I used.

nextflow run ~/pan-TE-analysis/GraffiTE/GraffiTE/main.nf \
  -profile cluster \
  --TE_library Gh-families.fa \
  --genotype false \
  --reference TM1.fa.gz \
  --vcf cotton-pg.TM1.uniqID.vcf.gz

The error message is

  The name D13_63029835 is used more than once in the fasta file. Second and later occurrences are appended a _number
  The name D13_63305987 is used more than once in the fasta file. Second and later occurrences are appended a _number

Work dir:
  /public/home/zyqi/pan-TE-analysis/version2/TE_annot/work/59/e0106b8fa214f54cb5bb6b0f674fee

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

I checked the .nextflow.log. Bellow is the error message:

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
Aug-10 05:50:39.234 [main] DEBUG nextflow.Session - Session await > all processes finished
Aug-10 05:50:39.246 [TaskFinalizer-1] DEBUG nextflow.Session - Session aborted -- Cause: Process `repeatmask_VCF (1)` terminated with an error exit status (140)
Aug-10 05:50:39.396 [main] DEBUG nextflow.Session - Session await > all barriers passed
Aug-10 05:50:39.396 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: lsf) - terminating tasks monitor poll loop
Aug-10 05:50:39.413 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=12h 2s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
Aug-10 05:50:39.565 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Aug-10 05:50:39.629 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

The sequences in indels.fa are like this. So I think the error maybe caused by the indels.fa, because the sequences contain ,. What do I need to do with the VCF, so that it can be used for GraffiTE?

>A01_55
CCCAC,AGATA,CTAAA,CTAAACCCTAAACCCAAAA,CTAAACCCATAAACCCTAAAA,CTAAACCCTAAACCCTAAACCCCTAAA,CCTAAAACCCTAAAACCCTAAA,CTAAAACCCTAAACCCTAACCCAAAA,CTAAAACCCTAAACCCTAAACCCCTAAA
>A01_87
CCC,CA,AAC,AAA
>A01_115
CC,ACA,CA
>A01_147
C,CAA,CCTAA
>A01_162
CCC,CTA
cgroza commented 4 weeks ago

Your VCF must biallellic. You must run bcftools norm -m- file.vcf to make it so. It will make the commas go away. Then you must make sure the variant IDs are unique and that they are valid contig fasta names.

-------- Original Message -------- On 8/10/24 11:11 AM, qizhengyang2017 wrote:

Hello, I tried to use GraffiTE to annotate the VCF file generated by Minigrap-Cactus . The VCF I used is the one that has gone through vcfbub to remove nested sites, as well as those greater than 100kb. I have changed the ID column to remove ">". Bellow is the command I used.

nextflow run

~

/pan-TE-analysis/GraffiTE/GraffiTE/main.nf \ -profile cluster \ --TE_library Gh-families.fa \ --genotype

false

\ --reference TM1.fa.gz \ --vcf cotton-pg.TM1.uniqID.vcf.gz

The error message is

The name D13_63029835 is used more than once in the fasta file. Second and later occurrences are appended a _number The name D13_63305987 is used more than once in the fasta file. Second and later occurrences are appended a _number

Work dir: /public/home/zyqi/pan-TE-analysis/version2/TE_annot/work/59/e0106b8fa214f54cb5bb6b0f674fee

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details

I checked the .nextflow.log. Bellow is the error message:

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh Aug-10 05:50:39.234 [main] DEBUG nextflow.Session - Session await > all processes finished Aug-10 05:50:39.246 [TaskFinalizer-1] DEBUG nextflow.Session - Session aborted -- Cause: Process repeatmask_VCF (1) terminated with an error exit status (140) Aug-10 05:50:39.396 [main] DEBUG nextflow.Session - Session await > all barriers passed Aug-10 05:50:39.396 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: lsf) - terminating tasks monitor poll loop Aug-10 05:50:39.413 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=12h 2s; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ] Aug-10 05:50:39.565 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done Aug-10 05:50:39.629 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

The sequences in indels.fa are like this. So I think the error maybe caused by the indels.fa, because the sequences contain ,. What do I need to do with the VCF, so that it can be used for GraffiTE?

A01_55 CCCAC,AGATA,CTAAA,CTAAACCCTAAACCCAAAA,CTAAACCCATAAACCCTAAAA,CTAAACCCTAAACCCTAAACCCCTAAA,CCTAAAACCCTAAAACCCTAAA,CTAAAACCCTAAACCCTAACCCAAAA,CTAAAACCCTAAACCCTAAACCCCTAAA A01_87 CCC,CA,AAC,AAA A01_115 CC,ACA,CA A01_147 C,CAA,CCTAA A01_162 CCC,CTA

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

cgroza commented 3 weeks ago

We added this notes to the README file for future users. Thanks for highlighting this.