Closed TobyBaril closed 2 months ago
The log mentions "killed". I think you are running out of memory and samtools is killed. Try increasing the memory for that process.
-------- Original Message -------- On 8/23/24 2:58 AM, Tobias Baril wrote:
Hi!
I'm making some progress with GraffiTE - but running into the this broken pipe during the long-read mapping. I'm wondering if this could be linked to my read inputs, or whether there is something I can change locally to overcome the error?
Thanks for your help so far!
Command executed:
minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz | samtools sort -m4G @.*** -o WEA02.bam -
Command exit status: 137
Command output: (empty)
Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: fuse2fs not found, will not be able to mount EXT3 filesystems [M::mm_idx_gen::1.7100.91] collected minimizers [M::mm_idx_gen::1.9711.31] sorted minimizers [M::main::1.9711.31] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::2.0511.30] mid_occ = 270 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::2.1011.29] distinct minimizers: 5134107 (85.10% are singletons); average occurrences: 1.758; average spacing: 5.427; total length: 48970219 [M::worker_pipeline::700.9383.95] mapped 84047 sequences [M::worker_pipeline::1367.0523.97] mapped 73931 sequences [M::worker_pipeline::2030.7023.98] mapped 73144 sequences [M::worker_pipeline::2716.1803.97] mapped 72473 sequences [M::worker_pipeline::3406.9243.96] mapped 71104 sequences [M::worker_pipeline::4099.4533.96] mapped 77239 sequences [M::worker_pipeline::4782.8063.96] mapped 72970 sequences [M::worker_pipeline::5476.8233.96] mapped 68308 sequences [M::worker_pipeline::6150.5503.96] mapped 68108 sequences [M::worker_pipeline::6834.3943.96] mapped 67409 sequences [M::worker_pipeline::7519.4363.95] mapped 66681 sequences [M::worker_pipeline::8213.9143.95] mapped 74633 sequences [M::worker_pipeline::8905.6343.95] mapped 68000 sequences executor > slurm (3) [80/52166a] process > map_longreads (1) [100%] 1 of 1, failed: 1 [- ] process > sniffles_sample_call - [- ] process > sniffles_population_call - [e0/fc48d1] process > map_asm (3) [100%] 3 of 3, cached: 3 ✔ [b7/11a084] process > svim_asm (1) [100%] 3 of 3, cached: 3 ✔ [d9/93fa76] process > survivor_merge [100%] 1 of 1, cached: 1 ✔ [- ] process > merge_svim_sniffles2 - [- ] process > repeatmask_VCF - [- ] process > tsd_prep - [- ] process > tsd_search - [- ] process > tsd_report - [- ] process > make_graph - [- ] process > graph_align_reads - [- ] process > vg_call - [- ] process > merge_VCFs - ERROR ~ Error executing process > 'map_longreads (3)'
Caused by: Process
map_longreads (3)
terminated with an error exit status (137)Command executed:
minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz | samtools sort -m4G @.*** -o WEA02.bam -
Command exit status: 137
Command output: (empty)
Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: fuse2fs not found, will not be able to mount EXT3 filesystems [M::mm_idx_gen::1.7100.91] collected minimizers [M::mm_idx_gen::1.9711.31] sorted minimizers [M::main::1.9711.31] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::2.0511.30] mid_occ = 270 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::2.1011.29] distinct minimizers: 5134107 (85.10% are singletons); average occurrences: 1.758; average spacing: 5.427; total length: 48970219 [M::worker_pipeline::700.9383.95] mapped 84047 sequences [M::worker_pipeline::1367.0523.97] mapped 73931 sequences [M::worker_pipeline::2030.7023.98] mapped 73144 sequences [M::worker_pipeline::2716.1803.97] mapped 72473 sequences [M::worker_pipeline::3406.9243.96] mapped 71104 sequences [M::worker_pipeline::4099.4533.96] mapped 77239 sequences [M::worker_pipeline::4782.8063.96] mapped 72970 sequences [M::worker_pipeline::5476.8233.96] mapped 68308 sequences [M::worker_pipeline::6150.5503.96] mapped 68108 sequences [M::worker_pipeline::6834.3943.96] mapped 67409 sequences [M::worker_pipeline::7519.4363.95] mapped 66681 sequences [M::worker_pipeline::8213.9143.95] mapped 74633 sequences [M::worker_pipeline::8905.6343.95] mapped 68000 sequences [M::worker_pipeline::9610.8473.95] mapped 67061 sequences [M::worker_pipeline::10265.2793.95] mapped 67719 sequences [M::worker_pipeline::14974.9253.08] mapped 65613 sequences [M::worker_pipeline::15112.6453.07] mapped 73625 sequences [M::worker_pipeline::15797.4473.10] mapped 66651 sequences [M::worker_pipeline::16463.6733.14] mapped 65961 sequences [M::worker_pipeline::17172.8033.17] mapped 66110 sequences [M::worker_pipeline::17861.2573.20] mapped 71617 sequences [M::worker_pipeline::18552.139*3.22] mapped 65841 sequences .command.sh: line 2: 16 Broken pipe minimap2 -t 4 -ax map-ont bob.chr25.ref.fa WEA02.fq.gz 17 Killed | samtools sort -m4G @.*** -o WEA02.bam -
Work dir: /data/toby/troutGenomics/work/66/03a73032dcb499fb54ffa0fe738e3a
Tip: you can replicate the issue by changing to the process work dir and entering the command
bash .command.run
-- Check '.nextflow.log' file for details
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Yes I would have said the same! Nextflow will not necessarily tell you that you ran out of memory. However it happened to me before when using an interactive SLURM queue: I got a similar error from minimap2, but when exiting the session, I could see the "out of memory" error from SLURM.
Great! The read mapping was successful with 80G memory! The rest of the pipeline was successful up to make_graph
:
[ec/dbb962] process > map_longreads (2) [100%] 3 of 3, cached: 3 ✔
[ed/1009ab] process > sniffles_sample_call (2) [100%] 3 of 3, cached: 3 ✔
[55/b91ffc] process > sniffles_population_call (1) [100%] 1 of 1, cached: 1 ✔
[c7/1fedc9] process > map_asm (2) [100%] 3 of 3, cached: 3 ✔
[96/b4038e] process > svim_asm (3) [100%] 3 of 3, cached: 3 ✔
[47/fa2fa1] process > survivor_merge [100%] 1 of 1, cached: 1 ✔
[7e/c304b9] process > merge_svim_sniffles2 (1) [100%] 1 of 1, cached: 1 ✔
[c5/d98685] process > repeatmask_VCF (1) [100%] 1 of 1, cached: 1 ✔
[bf/eb1752] process > tsd_prep (1) [100%] 1 of 1, cached: 1 ✔
[a3/763369] process > tsd_search (85) [100%] 91 of 91, cached: 91 ✔
[03/9afd7c] process > tsd_report (1) [100%] 1 of 1, cached: 1 ✔
[8b/1f1e1c] process > make_graph (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > graph_align_reads -
[- ] process > vg_call -
[- ] process > merge_VCFs -
WARN: Access to undefined parameter `make_graph_time` -- Initialise it to a default value eg. `params.make_graph_time = some_value`
ERROR ~ Error executing process > 'make_graph (1)'
Caused by:
Process `make_graph (1)` terminated with an error exit status (127)
Command executed:
null
Command exit status:
127
Command output:
(empty)
Command error:
INFO: /etc/singularity/ exists; cleanup by system administrator is not complete (see https://apptainer.org/docs/admin/latest/singularity_migration.html)
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
.command.sh: line 2: null: command not found
Work dir:
/data/toby/troutGenomics/work/8b/1f1e1c5d35dfb4a8ce90ccd9d7e16e
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
The pangenome.vcf
and reference fasta file look okay. However, when I look at the .command
files in the work directory:
.command.sh
:
#!/bin/bash -ue
null
.command.run
:
#!/bin/bash
### ---
### name: 'make_graph (1)'
### container: '/data/toby/troutGenomics/work/singularity/cgroza-collection-graffite-latest.img'
### outputs:
### - 'index'
### ...
set -e
set -u
NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x
NXF_ENTRY=${1:-nxf_main}
nxf_container_env() {
cat << EOF
export PATH="\$PATH:/data/toby/troutGenomics/GraffiTE/bin"
EOF
}
nxf_sleep() {
sleep $1 2>/dev/null || sleep 1;
}
nxf_date() {
local ts=$(date +%s%3N);
if [[ ${#ts} == 10 ]]; then echo ${ts}000
elif [[ $ts == *%3N ]]; then echo ${ts/\%3N/000}
elif [[ $ts == *3N ]]; then echo ${ts/3N/000}
elif [[ ${#ts} == 13 ]]; then echo $ts
else echo "Unexpected timestamp value: $ts"; exit 1
fi
}
nxf_env() {
echo '============= task environment ============='
env | sort | sed "s/\(.*\)AWS\(.*\)=\(.\{6\}\).*/\1AWS\2=\3xxxxxxxxxxxxx/"
echo '============= task output =================='
}
nxf_kill() {
declare -a children
while read P PP;do
children[$PP]+=" $P"
done < <(ps -e -o pid= -o ppid=)
kill_all() {
[[ $1 != $$ ]] && kill $1 2>/dev/null || true
for i in ${children[$1]:=}; do kill_all $i; done
}
kill_all $1
}
nxf_mktemp() {
local base=${1:-/tmp}
mkdir -p "$base"
if [[ $(uname) = Darwin ]]; then mktemp -d $base/nxf.XXXXXXXXXX
else TMPDIR="$base" mktemp -d -t nxf.XXXXXXXXXX
fi
}
nxf_fs_copy() {
local source=$1
local target=$2
local basedir=$(dirname $1)
mkdir -p $target/$basedir
cp -fRL $source $target/$basedir
}
nxf_fs_move() {
local source=$1
local target=$2
local basedir=$(dirname $1)
mkdir -p $target/$basedir
mv -f $source $target/$basedir
}
nxf_fs_rsync() {
rsync -rRl $1 $2
}
nxf_fs_rclone() {
rclone copyto $1 $2/$1
}
nxf_fs_fcp() {
fcp $1 $2/$1
}
on_exit() {
exit_status=${nxf_main_ret:=$?}
printf -- $exit_status > /data/toby/troutGenomics/work/8b/1f1e1c5d35dfb4a8ce90ccd9d7e16e/.exitcode
set +u
exit $exit_status
}
on_term() {
set +e
[[ "$pid" ]] && nxf_kill $pid
}
nxf_launch() {
set +u; env - PATH="$PATH" ${TMP:+SINGULARITYENV_TMP="$TMP"} ${TMPDIR:+SINGULARITYENV_TMPDIR="$TMPDIR"} ${NXF_TASK_WORKDIR:+SINGULARITYENV_NXF_TASK_WORKDIR="$NXF_TASK_WORKDIR"} singularity exec --no-home --pid -B /data/toby/troutGenomics --contain --bind /data/toby/troutGenomics/tmp:/tmp /data/toby/troutGenomics/work/singularity/cgroza-collection-graffite-latest.img /bin/bash -c "cd $NXF_TASK_WORKDIR; eval $(nxf_container_env); /bin/bash -ue /data/toby/troutGenomics/work/8b/1f1e1c5d35dfb4a8ce90ccd9d7e16e/.command.sh"
}
nxf_stage() {
true
# stage input files
rm -f bob.chr25.ref.fa
rm -f pangenome.vcf
ln -s /data/toby/troutGenomics/graffite_parameters/bob.chr25.ref.fa bob.chr25.ref.fa
ln -s /data/toby/troutGenomics/work/03/9afd7c2cce4a025c274f87bb169570/pangenome.vcf pangenome.vcf
}
nxf_unstage() {
true
[[ ${nxf_main_ret:=0} != 0 ]] && return
}
nxf_main() {
trap on_exit EXIT
trap on_term TERM INT USR2
trap '' USR1
[[ "${NXF_CHDIR:-}" ]] && cd "$NXF_CHDIR"
export NXF_BOXID="nxf-$(dd bs=18 count=1 if=/dev/urandom 2>/dev/null | base64 | tr +/ 0A | tr -d '\r\n')"
NXF_SCRATCH=''
[[ $NXF_DEBUG > 0 ]] && nxf_env
touch /data/toby/troutGenomics/work/8b/1f1e1c5d35dfb4a8ce90ccd9d7e16e/.command.begin
set +u
set -u
[[ $NXF_SCRATCH ]] && cd $NXF_SCRATCH
export NXF_TASK_WORKDIR="$PWD"
nxf_stage
set +e
(set -o pipefail; (nxf_launch | tee .command.out) 3>&1 1>&2 2>&3 | tee .command.err) &
pid=$!
wait $pid || nxf_main_ret=$?
nxf_unstage
}
$NXF_ENTRY
Thanks for your help so far!
You have a typo in your command:
--graph_method graph aligner \
Should be:
--graph_method graphaligner \
--graph_method
can be pangenie, graphaligner, or giraffe.
I added some new checks for parameter validity that will throw an error in the future.
Awesome! Everything ran smoothly for me in the end - thanks for your help and for making such a cool tool!
Hi!
I'm making some progress with GraffiTE - but running into the this broken pipe during the long-read mapping. I'm wondering if this could be linked to my read inputs, or whether there is something I can change locally to overcome the error?
Thanks for your help so far!
My command is: