cgroza / GraffiTE

GraffiTE is a pipeline that finds polymorphic transposable elements in genome assemblies and/or long reads, and genotypes the discovered polymorphisms in read sets using genome-graphs.
Other
116 stars 6 forks source link

Long read mapping: Broken pipe #38

Closed TobyBaril closed 2 months ago

TobyBaril commented 2 months ago

Hi!

I'm making some progress with GraffiTE - but running into the this broken pipe during the long-read mapping. I'm wondering if this could be linked to my read inputs, or whether there is something I can change locally to overcome the error?

Thanks for your help so far!

My command is:

nextflow run /data/toby/troutGenomics/GraffiTE/main.nf \
    --assemblies /data/toby/troutGenomics/graffite_parameters/assemblies.csv \
    --TE_library /data/toby/troutGenomics/graffite_parameters/bob-families.fa.strained \
    --reference /data/toby/troutGenomics/graffite_parameters/bob.chr25.ref.fa \
    --reads /data/toby/troutGenomics/graffite_parameters/reads.csv \
    --longreads /data/toby/troutGenomics/graffite_parameters/reads.csv \
    --cores 4 \
    --graph_method graph aligner \
    -profile cluster \
    -with-singularity /data/toby/troutGenomics/graffite_latest.sif \
    -resume
Command executed:

  minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz | samtools sort -m4G -@4 -o WEA02.bam  -

Command exit status:
  137

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
  [M::mm_idx_gen::1.710*0.91] collected minimizers
  [M::mm_idx_gen::1.971*1.31] sorted minimizers
  [M::main::1.971*1.31] loaded/built the index for 1 target sequence(s)
  [M::mm_mapopt_update::2.051*1.30] mid_occ = 270
  [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
  [M::mm_idx_stat::2.101*1.29] distinct minimizers: 5134107 (85.10% are singletons); average occurrences: 1.758; average spacing: 5.427; total length: 48970219
  [M::worker_pipeline::700.938*3.95] mapped 84047 sequences
  [M::worker_pipeline::1367.052*3.97] mapped 73931 sequences
  [M::worker_pipeline::2030.702*3.98] mapped 73144 sequences
  [M::worker_pipeline::2716.180*3.97] mapped 72473 sequences
  [M::worker_pipeline::3406.924*3.96] mapped 71104 sequences
  [M::worker_pipeline::4099.453*3.96] mapped 77239 sequences
  [M::worker_pipeline::4782.806*3.96] mapped 72970 sequences
  [M::worker_pipeline::5476.823*3.96] mapped 68308 sequences
  [M::worker_pipeline::6150.550*3.96] mapped 68108 sequences
  [M::worker_pipeline::6834.394*3.96] mapped 67409 sequences
  [M::worker_pipeline::7519.436*3.95] mapped 66681 sequences
  [M::worker_pipeline::8213.914*3.95] mapped 74633 sequences
  [M::worker_pipeline::8905.634*3.95] mapped 68000 sequences
executor >  slurm (3)
[80/52166a] process > map_longreads (1)        [100%] 1 of 1, failed: 1
[-        ] process > sniffles_sample_call     -
[-        ] process > sniffles_population_call -
[e0/fc48d1] process > map_asm (3)              [100%] 3 of 3, cached: 3 ✔
[b7/11a084] process > svim_asm (1)             [100%] 3 of 3, cached: 3 ✔
[d9/93fa76] process > survivor_merge           [100%] 1 of 1, cached: 1 ✔
[-        ] process > merge_svim_sniffles2     -
[-        ] process > repeatmask_VCF           -
[-        ] process > tsd_prep                 -
[-        ] process > tsd_search               -
[-        ] process > tsd_report               -
[-        ] process > make_graph               -
[-        ] process > graph_align_reads        -
[-        ] process > vg_call                  -
[-        ] process > merge_VCFs               -
ERROR ~ Error executing process > 'map_longreads (3)'

Caused by:
  Process `map_longreads (3)` terminated with an error exit status (137)

Command executed:

  minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz | samtools sort -m4G -@4 -o WEA02.bam  -

Command exit status:
  137

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    fuse2fs not found, will not be able to mount EXT3 filesystems
  [M::mm_idx_gen::1.710*0.91] collected minimizers
  [M::mm_idx_gen::1.971*1.31] sorted minimizers
  [M::main::1.971*1.31] loaded/built the index for 1 target sequence(s)
  [M::mm_mapopt_update::2.051*1.30] mid_occ = 270
  [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
  [M::mm_idx_stat::2.101*1.29] distinct minimizers: 5134107 (85.10% are singletons); average occurrences: 1.758; average spacing: 5.427; total length: 48970219
  [M::worker_pipeline::700.938*3.95] mapped 84047 sequences
  [M::worker_pipeline::1367.052*3.97] mapped 73931 sequences
  [M::worker_pipeline::2030.702*3.98] mapped 73144 sequences
  [M::worker_pipeline::2716.180*3.97] mapped 72473 sequences
  [M::worker_pipeline::3406.924*3.96] mapped 71104 sequences
  [M::worker_pipeline::4099.453*3.96] mapped 77239 sequences
  [M::worker_pipeline::4782.806*3.96] mapped 72970 sequences
  [M::worker_pipeline::5476.823*3.96] mapped 68308 sequences
  [M::worker_pipeline::6150.550*3.96] mapped 68108 sequences
  [M::worker_pipeline::6834.394*3.96] mapped 67409 sequences
  [M::worker_pipeline::7519.436*3.95] mapped 66681 sequences
  [M::worker_pipeline::8213.914*3.95] mapped 74633 sequences
  [M::worker_pipeline::8905.634*3.95] mapped 68000 sequences
  [M::worker_pipeline::9610.847*3.95] mapped 67061 sequences
  [M::worker_pipeline::10265.279*3.95] mapped 67719 sequences
  [M::worker_pipeline::14974.925*3.08] mapped 65613 sequences
  [M::worker_pipeline::15112.645*3.07] mapped 73625 sequences
  [M::worker_pipeline::15797.447*3.10] mapped 66651 sequences
  [M::worker_pipeline::16463.673*3.14] mapped 65961 sequences
  [M::worker_pipeline::17172.803*3.17] mapped 66110 sequences
  [M::worker_pipeline::17861.257*3.20] mapped 71617 sequences
  [M::worker_pipeline::18552.139*3.22] mapped 65841 sequences
  .command.sh: line 2:    16 Broken pipe             minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz
          17 Killed                  | samtools sort -m4G -@4 -o WEA02.bam -

Work dir:
  /data/toby/troutGenomics/work/66/03a73032dcb499fb54ffa0fe738e3a

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
cgroza commented 2 months ago

The log mentions "killed". I think you are running out of memory and samtools is killed. Try increasing the memory for that process.

-------- Original Message -------- On 8/23/24 2:58 AM, Tobias Baril wrote:

Hi!

I'm making some progress with GraffiTE - but running into the this broken pipe during the long-read mapping. I'm wondering if this could be linked to my read inputs, or whether there is something I can change locally to overcome the error?

Thanks for your help so far!

Command executed:

minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz | samtools sort -m4G @.*** -o WEA02.bam -

Command exit status: 137

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: fuse2fs not found, will not be able to mount EXT3 filesystems [M::mm_idx_gen::1.7100.91] collected minimizers [M::mm_idx_gen::1.9711.31] sorted minimizers [M::main::1.9711.31] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::2.0511.30] mid_occ = 270 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::2.1011.29] distinct minimizers: 5134107 (85.10% are singletons); average occurrences: 1.758; average spacing: 5.427; total length: 48970219 [M::worker_pipeline::700.9383.95] mapped 84047 sequences [M::worker_pipeline::1367.0523.97] mapped 73931 sequences [M::worker_pipeline::2030.7023.98] mapped 73144 sequences [M::worker_pipeline::2716.1803.97] mapped 72473 sequences [M::worker_pipeline::3406.9243.96] mapped 71104 sequences [M::worker_pipeline::4099.4533.96] mapped 77239 sequences [M::worker_pipeline::4782.8063.96] mapped 72970 sequences [M::worker_pipeline::5476.8233.96] mapped 68308 sequences [M::worker_pipeline::6150.5503.96] mapped 68108 sequences [M::worker_pipeline::6834.3943.96] mapped 67409 sequences [M::worker_pipeline::7519.4363.95] mapped 66681 sequences [M::worker_pipeline::8213.9143.95] mapped 74633 sequences [M::worker_pipeline::8905.6343.95] mapped 68000 sequences executor > slurm (3) [80/52166a] process > map_longreads (1) [100%] 1 of 1, failed: 1 [- ] process > sniffles_sample_call - [- ] process > sniffles_population_call - [e0/fc48d1] process > map_asm (3) [100%] 3 of 3, cached: 3 ✔ [b7/11a084] process > svim_asm (1) [100%] 3 of 3, cached: 3 ✔ [d9/93fa76] process > survivor_merge [100%] 1 of 1, cached: 1 ✔ [- ] process > merge_svim_sniffles2 - [- ] process > repeatmask_VCF - [- ] process > tsd_prep - [- ] process > tsd_search - [- ] process > tsd_report - [- ] process > make_graph - [- ] process > graph_align_reads - [- ] process > vg_call - [- ] process > merge_VCFs - ERROR ~ Error executing process > 'map_longreads (3)'

Caused by: Process map_longreads (3) terminated with an error exit status (137)

Command executed:

minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz | samtools sort -m4G @.*** -o WEA02.bam -

Command exit status: 137

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: fuse2fs not found, will not be able to mount EXT3 filesystems [M::mm_idx_gen::1.7100.91] collected minimizers [M::mm_idx_gen::1.9711.31] sorted minimizers [M::main::1.9711.31] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::2.0511.30] mid_occ = 270 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::2.1011.29] distinct minimizers: 5134107 (85.10% are singletons); average occurrences: 1.758; average spacing: 5.427; total length: 48970219 [M::worker_pipeline::700.9383.95] mapped 84047 sequences [M::worker_pipeline::1367.0523.97] mapped 73931 sequences [M::worker_pipeline::2030.7023.98] mapped 73144 sequences [M::worker_pipeline::2716.1803.97] mapped 72473 sequences [M::worker_pipeline::3406.9243.96] mapped 71104 sequences [M::worker_pipeline::4099.4533.96] mapped 77239 sequences [M::worker_pipeline::4782.8063.96] mapped 72970 sequences [M::worker_pipeline::5476.8233.96] mapped 68308 sequences [M::worker_pipeline::6150.5503.96] mapped 68108 sequences [M::worker_pipeline::6834.3943.96] mapped 67409 sequences [M::worker_pipeline::7519.4363.95] mapped 66681 sequences [M::worker_pipeline::8213.9143.95] mapped 74633 sequences [M::worker_pipeline::8905.6343.95] mapped 68000 sequences [M::worker_pipeline::9610.8473.95] mapped 67061 sequences [M::worker_pipeline::10265.2793.95] mapped 67719 sequences [M::worker_pipeline::14974.9253.08] mapped 65613 sequences [M::worker_pipeline::15112.6453.07] mapped 73625 sequences [M::worker_pipeline::15797.4473.10] mapped 66651 sequences [M::worker_pipeline::16463.6733.14] mapped 65961 sequences [M::worker_pipeline::17172.8033.17] mapped 66110 sequences [M::worker_pipeline::17861.2573.20] mapped 71617 sequences [M::worker_pipeline::18552.139*3.22] mapped 65841 sequences .command.sh: line 2: 16 Broken pipe minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz 17 Killed | samtools sort -m4G @.*** -o WEA02.bam -

Work dir: /data/toby/troutGenomics/work/66/03a73032dcb499fb54ffa0fe738e3a

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

clemgoub commented 2 months ago

Yes I would have said the same! Nextflow will not necessarily tell you that you ran out of memory. However it happened to me before when using an interactive SLURM queue: I got a similar error from minimap2, but when exiting the session, I could see the "out of memory" error from SLURM.

TobyBaril commented 2 months ago

Great! The read mapping was successful with 80G memory! The rest of the pipeline was successful up to make_graph:

[ec/dbb962] process > map_longreads (2)            [100%] 3 of 3, cached: 3 ✔
[ed/1009ab] process > sniffles_sample_call (2)     [100%] 3 of 3, cached: 3 ✔
[55/b91ffc] process > sniffles_population_call (1) [100%] 1 of 1, cached: 1 ✔
[c7/1fedc9] process > map_asm (2)                  [100%] 3 of 3, cached: 3 ✔
[96/b4038e] process > svim_asm (3)                 [100%] 3 of 3, cached: 3 ✔
[47/fa2fa1] process > survivor_merge               [100%] 1 of 1, cached: 1 ✔
[7e/c304b9] process > merge_svim_sniffles2 (1)     [100%] 1 of 1, cached: 1 ✔
[c5/d98685] process > repeatmask_VCF (1)           [100%] 1 of 1, cached: 1 ✔
[bf/eb1752] process > tsd_prep (1)                 [100%] 1 of 1, cached: 1 ✔
[a3/763369] process > tsd_search (85)              [100%] 91 of 91, cached: 91 ✔
[03/9afd7c] process > tsd_report (1)               [100%] 1 of 1, cached: 1 ✔
[8b/1f1e1c] process > make_graph (1)               [100%] 1 of 1, failed: 1 ✘
[-        ] process > graph_align_reads            -
[-        ] process > vg_call                      -
[-        ] process > merge_VCFs                   -
WARN: Access to undefined parameter `make_graph_time` -- Initialise it to a default value eg. `params.make_graph_time = some_value`
ERROR ~ Error executing process > 'make_graph (1)'

Caused by:
  Process `make_graph (1)` terminated with an error exit status (127)

Command executed:

  null

Command exit status:
  127

Command output:
  (empty)

Command error:
  INFO:    /etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html)
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  .command.sh: line 2: null: command not found

Work dir:
  /data/toby/troutGenomics/work/8b/1f1e1c5d35dfb4a8ce90ccd9d7e16e

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details

The pangenome.vcf and reference fasta file look okay. However, when I look at the .command files in the work directory:

.command.sh:

#!/bin/bash -ue
null

.command.run:

#!/bin/bash
### ---
### name: 'make_graph (1)'
### container: '/data/toby/troutGenomics/work/singularity/cgroza-collection-graffite-latest.img'
### outputs:
### - 'index'
### ...
set -e
set -u
NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x
NXF_ENTRY=${1:-nxf_main}

nxf_container_env() {
cat << EOF
export PATH="\$PATH:/data/toby/troutGenomics/GraffiTE/bin"
EOF
}

nxf_sleep() {
  sleep $1 2>/dev/null || sleep 1;
}

nxf_date() {
    local ts=$(date +%s%3N);
    if [[ ${#ts} == 10 ]]; then echo ${ts}000
    elif [[ $ts == *%3N ]]; then echo ${ts/\%3N/000}
    elif [[ $ts == *3N ]]; then echo ${ts/3N/000}
    elif [[ ${#ts} == 13 ]]; then echo $ts
    else echo "Unexpected timestamp value: $ts"; exit 1
    fi
}

nxf_env() {
    echo '============= task environment ============='
    env | sort | sed "s/\(.*\)AWS\(.*\)=\(.\{6\}\).*/\1AWS\2=\3xxxxxxxxxxxxx/"
    echo '============= task output =================='
}

nxf_kill() {
    declare -a children
    while read P PP;do
        children[$PP]+=" $P"
    done < <(ps -e -o pid= -o ppid=)

    kill_all() {
        [[ $1 != $$ ]] && kill $1 2>/dev/null || true
        for i in ${children[$1]:=}; do kill_all $i; done
    }

    kill_all $1
}

nxf_mktemp() {
    local base=${1:-/tmp}
    mkdir -p "$base"
    if [[ $(uname) = Darwin ]]; then mktemp -d $base/nxf.XXXXXXXXXX
    else TMPDIR="$base" mktemp -d -t nxf.XXXXXXXXXX
    fi
}

nxf_fs_copy() {
  local source=$1
  local target=$2
  local basedir=$(dirname $1)
  mkdir -p $target/$basedir
  cp -fRL $source $target/$basedir
}

nxf_fs_move() {
  local source=$1
  local target=$2
  local basedir=$(dirname $1)
  mkdir -p $target/$basedir
  mv -f $source $target/$basedir
}

nxf_fs_rsync() {
  rsync -rRl $1 $2
}

nxf_fs_rclone() {
  rclone copyto $1 $2/$1
}

nxf_fs_fcp() {
  fcp $1 $2/$1
}

on_exit() {
    exit_status=${nxf_main_ret:=$?}
    printf -- $exit_status > /data/toby/troutGenomics/work/8b/1f1e1c5d35dfb4a8ce90ccd9d7e16e/.exitcode
    set +u
    exit $exit_status
}

on_term() {
    set +e
    [[ "$pid" ]] && nxf_kill $pid
}

nxf_launch() {
    set +u; env - PATH="$PATH" ${TMP:+SINGULARITYENV_TMP="$TMP"} ${TMPDIR:+SINGULARITYENV_TMPDIR="$TMPDIR"} ${NXF_TASK_WORKDIR:+SINGULARITYENV_NXF_TASK_WORKDIR="$NXF_TASK_WORKDIR"} singularity exec --no-home --pid -B /data/toby/troutGenomics --contain --bind /data/toby/troutGenomics/tmp:/tmp /data/toby/troutGenomics/work/singularity/cgroza-collection-graffite-latest.img /bin/bash -c "cd $NXF_TASK_WORKDIR; eval $(nxf_container_env); /bin/bash -ue /data/toby/troutGenomics/work/8b/1f1e1c5d35dfb4a8ce90ccd9d7e16e/.command.sh"
}

nxf_stage() {
    true
    # stage input files
    rm -f bob.chr25.ref.fa
    rm -f pangenome.vcf
    ln -s /data/toby/troutGenomics/graffite_parameters/bob.chr25.ref.fa bob.chr25.ref.fa
    ln -s /data/toby/troutGenomics/work/03/9afd7c2cce4a025c274f87bb169570/pangenome.vcf pangenome.vcf
}

nxf_unstage() {
    true
    [[ ${nxf_main_ret:=0} != 0 ]] && return
}

nxf_main() {
    trap on_exit EXIT
    trap on_term TERM INT USR2
    trap '' USR1

    [[ "${NXF_CHDIR:-}" ]] && cd "$NXF_CHDIR"
    export NXF_BOXID="nxf-$(dd bs=18 count=1 if=/dev/urandom 2>/dev/null | base64 | tr +/ 0A | tr -d '\r\n')"
    NXF_SCRATCH=''
    [[ $NXF_DEBUG > 0 ]] && nxf_env
    touch /data/toby/troutGenomics/work/8b/1f1e1c5d35dfb4a8ce90ccd9d7e16e/.command.begin
    set +u
    set -u
    [[ $NXF_SCRATCH ]] && cd $NXF_SCRATCH
    export NXF_TASK_WORKDIR="$PWD"
    nxf_stage

    set +e
    (set -o pipefail; (nxf_launch | tee .command.out) 3>&1 1>&2 2>&3 | tee .command.err) &
    pid=$!
    wait $pid || nxf_main_ret=$?
    nxf_unstage
}

$NXF_ENTRY

Thanks for your help so far!

cgroza commented 2 months ago

You have a typo in your command:

    --graph_method graph aligner \

Should be:

    --graph_method graphaligner \

--graph_method can be pangenie, graphaligner, or giraffe.

I added some new checks for parameter validity that will throw an error in the future.

TobyBaril commented 2 months ago

Awesome! Everything ran smoothly for me in the end - thanks for your help and for making such a cool tool!