MHH-RCUG / nf_wochenende

A nextflow version of the Wochenende reference metagenome binning and visualization pipeline
MIT License
13 stars 2 forks source link

growth rate error #66

Closed LisaHollstein closed 2 years ago

LisaHollstein commented 2 years ago

I have some data for wich no growth rates are created by nf_wochenende. However the original Wochenende calculates the growth rates.

I get no usefull errormessage. Nextflow just outputs:

[e8/f85cf6] NOTE: Process `raspir (JUFO_103_S293_R1.trm.ns.fix.s.dup.mm.mq30.calmd.raspir.csv)` terminated with an error exit status (1) -- Error is ignored

And the logs are also almost empty.

.comand.log and .command.out only contain:

INFO: Started bacterial growth rate analysis

So the analysis was not completed.

.command.sh contains:

#!/bin/bash -ue
cp -R /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/growth_rate/ .
cp -R /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/scripts/ .
cp scripts/*.sh .

echo "INFO: Started bacterial growth rate analysis"
cp growth_rate/* .

bash runbatch_bed_to_csv.sh  >/dev/null 2>&1

bash run_reproduction_determiner.sh  >/dev/null 2>&1

echo "INFO: Completed bacterial growth rate analysis, see growth_rate/fit_results/output for results"

.command.run has a lot of rows and is therefore a little bit confusing

#!/bin/bash
#SBATCH -D /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/d8/b4e5a697c8cbaeb3d1dfbdfa64ac15
#SBATCH -J nf-growth_rate_(JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam)
#SBATCH -o /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/d8/b4e5a697c8cbaeb3d1dfbdfa64ac15/.command.log
#SBATCH --no-requeue
#SBATCH --signal B:USR2@30
#SBATCH --mem 8192M
# NEXTFLOW TASK: growth_rate (JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam)
set -e
set -u
NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x
NXF_ENTRY=${1:-nxf_main}

nxf_tree() {
    local pid=$1

    declare -a ALL_CHILDREN
    while read P PP;do
        ALL_CHILDREN[$PP]+=" $P"
    done < <(ps -e -o pid= -o ppid=)

    pstat() {
        local x_pid=$1
        local STATUS=$(2> /dev/null < /proc/$1/status egrep 'Vm|ctxt')

        if [ $? = 0 ]; then
        local  x_vsz=$(echo "$STATUS" | grep VmSize | awk '{print $2}' || echo -n '0')
        local  x_rss=$(echo "$STATUS" | grep VmRSS | awk '{print $2}' || echo -n '0')
        local x_peak=$(echo "$STATUS" | egrep 'VmPeak|VmHWM' | sed 's/^.*:\s*//' | sed 's/[\sa-zA-Z]*$//' | tr '\n' ' ' || echo -n '0 0')
        local x_pmem=$(awk -v rss=$x_rss -v mem_tot=$mem_tot 'BEGIN {printf "%.0f", rss/mem_tot*100*10}' || echo -n '0')
        local vol_ctxt=$(echo "$STATUS" | grep '\bvoluntary_ctxt_switches' | awk '{print $2}' || echo -n '0')
        local inv_ctxt=$(echo "$STATUS" | grep '\bnonvoluntary_ctxt_switches' | awk '{print $2}' || echo -n '0')
        cpu_stat[x_pid]="$x_pid $x_pmem $x_vsz $x_rss $x_peak $vol_ctxt $inv_ctxt"
        fi
    }

    pwalk() {
        pstat $1
        for i in ${ALL_CHILDREN[$1]:=}; do pwalk $i; done
    }

    pwalk $1
}

nxf_stat() {
    cpu_stat=()
    nxf_tree $1

    declare -a sum=(0 0 0 0 0 0 0 0)
    local pid
    local i
    for pid in "${!cpu_stat[@]}"; do
        local row=(${cpu_stat[pid]})
        [ $NXF_DEBUG = 1 ] && echo "++ stat mem=${row[*]}"
        for i in "${!row[@]}"; do
        if [ $i != 0 ]; then
            sum[i]=$((sum[i]+row[i]))
        fi
        done
    done

    [ $NXF_DEBUG = 1 ] && echo -e "++ stat SUM=${sum[*]}"

    for i in {1..7}; do
        if [ ${sum[i]} -lt ${cpu_peak[i]} ]; then
            sum[i]=${cpu_peak[i]}
        else
            cpu_peak[i]=${sum[i]}
        fi
    done

    [ $NXF_DEBUG = 1 ] && echo -e "++ stat PEAK=${sum[*]}\n"
    nxf_stat_ret=(${sum[*]})
}

nxf_mem_watch() {
    set -o pipefail
    local pid=$1
    local trace_file=.command.trace
    local count=0;
    declare -a cpu_stat=(0 0 0 0 0 0 0 0)
    declare -a cpu_peak=(0 0 0 0 0 0 0 0)
    local mem_tot=$(< /proc/meminfo grep MemTotal | awk '{print $2}')
    local timeout
    local DONE
    local STOP=''

    [ $NXF_DEBUG = 1 ] && nxf_sleep 0.2 && ps fx

    while true; do
        nxf_stat $pid
        if [ $count -lt 10 ]; then timeout=1;
        elif [ $count -lt 120 ]; then timeout=5;
        else timeout=30;
        fi
        read -t $timeout -r DONE || true
        [[ $DONE ]] && break
        if [ ! -e /proc/$pid ]; then
            [ ! $STOP ] && STOP=$(nxf_date)
            [ $(($(nxf_date)-STOP)) -gt 10000 ] && break
        fi
        count=$((count+1))
    done

    echo "%mem=${nxf_stat_ret[1]}"      >> $trace_file
    echo "vmem=${nxf_stat_ret[2]}"      >> $trace_file
    echo "rss=${nxf_stat_ret[3]}"       >> $trace_file
    echo "peak_vmem=${nxf_stat_ret[4]}" >> $trace_file
    echo "peak_rss=${nxf_stat_ret[5]}"  >> $trace_file
    echo "vol_ctxt=${nxf_stat_ret[6]}"  >> $trace_file
    echo "inv_ctxt=${nxf_stat_ret[7]}"  >> $trace_file
}

nxf_write_trace() {
    echo "nextflow.trace/v2"           > $trace_file
    echo "realtime=$wall_time"         >> $trace_file
    echo "%cpu=$ucpu"                  >> $trace_file
    echo "rchar=${io_stat1[0]}"        >> $trace_file
    echo "wchar=${io_stat1[1]}"        >> $trace_file
    echo "syscr=${io_stat1[2]}"        >> $trace_file
    echo "syscw=${io_stat1[3]}"        >> $trace_file
    echo "read_bytes=${io_stat1[4]}"   >> $trace_file
    echo "write_bytes=${io_stat1[5]}"  >> $trace_file
}

nxf_trace_mac() {
    local start_millis=$(nxf_date)

    /bin/bash -ue /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/d8/b4e5a697c8cbaeb3d1dfbdfa64ac15/.command.sh

    local end_millis=$(nxf_date)
    local wall_time=$((end_millis-start_millis))
    local ucpu=''
    local io_stat1=('' '' '' '' '' '')
    nxf_write_trace
}

nxf_fd() {
    local FD=11
    while [ -e /proc/$$/fd/$FD ]; do FD=$((FD+1)); done
    echo $FD
}

nxf_trace_linux() {
    local pid=$$
    command -v ps &>/dev/null || { >&2 echo "Command 'ps' required by nextflow to collect task metrics cannot be found"; exit 1; }
    local num_cpus=$(< /proc/cpuinfo grep '^processor' -c)
    local tot_time0=$(grep '^cpu ' /proc/stat | awk '{sum=$2+$3+$4+$5+$6+$7+$8+$9; printf "%.0f",sum}')
    local cpu_time0=$(2> /dev/null < /proc/$pid/stat awk '{printf "%.0f", ($16+$17)*10 }' || echo -n 'X')
    local io_stat0=($(2> /dev/null < /proc/$pid/io sed 's/^.*:\s*//' | head -n 6 | tr '\n' ' ' || echo -n '0 0 0 0 0 0'))
    local start_millis=$(nxf_date)
    trap 'kill $mem_proc' ERR

    /bin/bash -ue /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/d8/b4e5a697c8cbaeb3d1dfbdfa64ac15/.command.sh &
    local task=$!

    mem_fd=$(nxf_fd)
    eval "exec $mem_fd> >(nxf_mem_watch $task)"
    local mem_proc=$!

    wait $task

    local end_millis=$(nxf_date)
    local tot_time1=$(grep '^cpu ' /proc/stat | awk '{sum=$2+$3+$4+$5+$6+$7+$8+$9; printf "%.0f",sum}')
    local cpu_time1=$(2> /dev/null < /proc/$pid/stat awk '{printf "%.0f", ($16+$17)*10 }' || echo -n 'X')
    local ucpu=$(awk -v p1=$cpu_time1 -v p0=$cpu_time0 -v t1=$tot_time1 -v t0=$tot_time0 -v n=$num_cpus 'BEGIN { pct=(p1-p0)/(t1-t0)*100*n; printf("%.0f", pct>0 ? pct : 0) }' )

    local io_stat1=($(2> /dev/null < /proc/$pid/io sed 's/^.*:\s*//' | head -n 6 | tr '\n' ' ' || echo -n '0 0 0 0 0 0'))
    local i
    for i in {0..5}; do
        io_stat1[i]=$((io_stat1[i]-io_stat0[i]))
    done

    local wall_time=$((end_millis-start_millis))
    [ $NXF_DEBUG = 1 ] && echo "+++ STATS %CPU=$ucpu TIME=$wall_time I/O=${io_stat1[*]}"

    echo "nextflow.trace/v2"           > $trace_file
    echo "realtime=$wall_time"         >> $trace_file
    echo "%cpu=$ucpu"                  >> $trace_file
    echo "rchar=${io_stat1[0]}"        >> $trace_file
    echo "wchar=${io_stat1[1]}"        >> $trace_file
    echo "syscr=${io_stat1[2]}"        >> $trace_file
    echo "syscw=${io_stat1[3]}"        >> $trace_file
    echo "read_bytes=${io_stat1[4]}"   >> $trace_file
    echo "write_bytes=${io_stat1[5]}"  >> $trace_file

    [ -e /proc/$mem_proc ] && eval "echo 'DONE' >&$mem_fd" || true
    wait $mem_proc 2>/dev/null || true
    while [ -e /proc/$mem_proc ]; do nxf_sleep 0.1; done
}

nxf_trace() {
    local trace_file=.command.trace
    touch $trace_file
    if [[ $(uname) = Darwin ]]; then
        nxf_trace_mac
    else
        nxf_trace_linux
    fi
}
# aws cli retry config
export AWS_RETRY_MODE=standard
export AWS_MAX_ATTEMPTS=5
# aws helper
nxf_s3_upload() {
    local name=$1
    local s3path=$2
    if [[ "$name" == - ]]; then
      aws s3 cp --only-show-errors --storage-class STANDARD - "$s3path"
    elif [[ -d "$name" ]]; then
      aws s3 cp --only-show-errors --recursive --storage-class STANDARD "$name" "$s3path/$name"
    else
      aws s3 cp --only-show-errors --storage-class STANDARD "$name" "$s3path/$name"
    fi
}

nxf_s3_download() {
    local source=$1
    local target=$2
    local file_name=$(basename $1)
    local is_dir=$(aws s3 ls $source | grep -F "PRE ${file_name}/" -c)
    if [[ $is_dir == 1 ]]; then
        aws s3 cp --only-show-errors --recursive "$source" "$target"
    else
        aws s3 cp --only-show-errors "$source" "$target"
    fi
}

nxf_sleep() {
  sleep $1 2>/dev/null || sleep 1;
}

nxf_date() {
    local ts=$(date +%s%3N);
    if [[ ${#ts} == 10 ]]; then echo ${ts}000
    elif [[ $ts == *%3N ]]; then echo ${ts/\%3N/000}
    elif [[ $ts == *3N ]]; then echo ${ts/3N/000}
    elif [[ ${#ts} == 13 ]]; then echo $ts
    else echo "Unexpected timestamp value: $ts"; exit 1
    fi
}

nxf_env() {
    echo '============= task environment ============='
    env | sort | sed "s/\(.*\)AWS\(.*\)=\(.\{6\}\).*/\1AWS\2=\3xxxxxxxxxxxxx/"
    echo '============= task output =================='
}

nxf_kill() {
    declare -a children
    while read P PP;do
        children[$PP]+=" $P"
    done < <(ps -e -o pid= -o ppid=)

    kill_all() {
        [[ $1 != $$ ]] && kill $1 2>/dev/null || true
        for i in ${children[$1]:=}; do kill_all $i; done
    }

    kill_all $1
}

nxf_mktemp() {
    local base=${1:-/tmp}
    if [[ $(uname) = Darwin ]]; then mktemp -d $base/nxf.XXXXXXXXXX
    else TMPDIR="$base" mktemp -d -t nxf.XXXXXXXXXX
    fi
}

nxf_fs_copy() {
  local source=$1
  local target=$2
  local basedir=$(dirname $1)
  mkdir -p $target/$basedir
  cp -fRL $source $target/$basedir
}

nxf_fs_move() {
  local source=$1
  local target=$2
  local basedir=$(dirname $1)
  mkdir -p $target/$basedir
  mv -f $source $target/$basedir
}

nxf_fs_rsync() {
  rsync -rRl $1 $2
}

on_exit() {
    exit_status=${nxf_main_ret:=$?}
    printf $exit_status > /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/d8/b4e5a697c8cbaeb3d1dfbdfa64ac15/.exitcode
    set +u
    [[ "$tee1" ]] && kill $tee1 2>/dev/null
    [[ "$tee2" ]] && kill $tee2 2>/dev/null
    [[ "$ctmp" ]] && rm -rf $ctmp || true
    exit $exit_status
}

on_term() {
    set +e
    [[ "$pid" ]] && nxf_kill $pid
}

nxf_launch() {
    /bin/bash /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/d8/b4e5a697c8cbaeb3d1dfbdfa64ac15/.command.run nxf_trace
}

nxf_stage() {
    true
    # stage input files
    rm -f JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam
    rm -f JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam.bai
    rm -f JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.bam.txt
    rm -f JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.bam.txt
    rm -f JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam.txt
    ln -s /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/e0/f9346c217638ff2888e1bf38577c4c/JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam
    ln -s /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/e0/f9346c217638ff2888e1bf38577c4c/JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam.bai JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam.bai
    ln -s /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/e0/f9346c217638ff2888e1bf38577c4c/JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.bam.txt JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.bam.txt
    ln -s /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/e0/f9346c217638ff2888e1bf38577c4c/JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.bam.txt JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.bam.txt
    ln -s /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/e0/f9346c217638ff2888e1bf38577c4c/JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam.txt JUFO_0_S351_R1.trm.ns.fix.s.dup.mm.mq30.calmd.bam.txt
}

nxf_unstage() {
    true
    [[ ${nxf_main_ret:=0} != 0 ]] && return
}

nxf_main() {
    trap on_exit EXIT
    trap on_term TERM INT USR2
    trap '' USR1

    [[ "${NXF_CHDIR:-}" ]] && cd "$NXF_CHDIR"
    NXF_SCRATCH=''
    [[ $NXF_DEBUG > 0 ]] && nxf_env
    touch /mnt/ngsnfs/gen/rcug_lw/Lisa/github/rcug_repositories/nf_wochenende/work/d8/b4e5a697c8cbaeb3d1dfbdfa64ac15/.command.begin
    set +u
    # conda environment
    source $(conda info --json | awk '/conda_prefix/ { gsub(/"|,/, "", $2); print $2 }')/bin/activate /mnt/ngsnfs/gen/rcug_lw/miniconda3/envs/nf_wochenende
    set -u
    [[ $NXF_SCRATCH ]] && echo "nxf-scratch-dir $HOSTNAME:$NXF_SCRATCH" && cd $NXF_SCRATCH
    nxf_stage

    set +e
    ctmp=$(set +u; nxf_mktemp /dev/shm 2>/dev/null || nxf_mktemp $TMPDIR)
    local cout=$ctmp/.command.out; mkfifo $cout
    local cerr=$ctmp/.command.err; mkfifo $cerr
    tee .command.out < $cout &
    tee1=$!
    tee .command.err < $cerr >&2 &
    tee2=$!
    ( nxf_launch ) >$cout 2>$cerr &
    pid=$!
    wait $pid || nxf_main_ret=$?
    wait $tee1 $tee2
    nxf_unstage
}

$NXF_ENTRY

.command.begin, .command.err and .command.trace are completely empty.

.exitcode is 1

colindaven commented 2 years ago

Thanks Lisa, it's probably not memory, but can you try again after editing the following in the growth_rate process of the .nf file ?

    memory { 32.GB * task.attempt }
or
    memory { 64.GB * task.attempt }

I'll try to find tests which work and fail in the next few days

colindaven commented 2 years ago

Also please retry with these data


curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_metagenome_mock_dna_1.fastq.gz

or

wget  ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_metagenome_mock_dna_1.fastq.gz

then gunzip that file

Then 
head - n 250000 x.fastq > mock_R1.fastq
and 
head - n 200000 x.fastq > mock_200k_R1.fastq

Now you have two files to test Wochenende, haybaler and growth rates
LisaHollstein commented 2 years ago

Thanks Lisa, it's probably not memory, but can you try again after editing the following in the growth_rate process of the .nf file ?

  memory { 32.GB * task.attempt }
or
  memory { 64.GB * task.attempt }

I'll try to find tests which work and fail in the next few days

As expected, this did not solve the issue

LisaHollstein commented 2 years ago

Also please retry with these data


curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_metagenome_mock_dna_1.fastq.gz

or

wget  ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_metagenome_mock_dna_1.fastq.gz

then 
gunzip x.gz
Then 
head -n 250000 x.fastq > mock_R1.fastq
and 
head -n 200000 x.fastq > mock_200k_R1.fastq

Now you have two files to test Wochenende, haybaler and growth rates

What parameters and which reference do you use for this?

colindaven commented 2 years ago

All params direct from https://github.com/MHH-RCUG/nf_wochenende/blob/colin_dev/nextflow.config

There are problems with growth_rate output though. It outputs a folder, which may overwrite other folders if they exist. So I'm trying to just output the .csv files, which should all have unique names.

colindaven commented 2 years ago

Using the csv as output approach, it looks a lot better.

I get for example the following in the output folder. Please update your branch from colin_dev and check ? Thanks


nf_wochenende/output/growth_rate/fit_results/output$ ls -1
DRR_sm_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
ERR9809359_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv

edit: some files are empty, eg only headers, but all were successfully run, no errors in nextflow here

cat *
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE015929_1_Staphylococcus_epidermidis_ATCC_12228__complete_genome_BAC_pos,failed,2.06,1892,50,29,6.55,[-2]
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AP006716_1_Staphylococcus_haemolyticus_JCSC1435_DNA__complete_genome_BAC_pos,moderate,1.43,187512,50,47,1.99,[]
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AP010969_1_Streptococcus_intermedius_JTH08_DNA__complete_genome_BAC_pos,failed,1.72,1225,50,16,4.28,"[-2, -3]"
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP003800_1_Streptococcus_constellatus_subsp__pharyngis_C232__complete_genome_BAC_pos,failed,1.93,1994,50,21,10.81,"[-3, -5]"
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP003860_1_Streptococcus_anginosus_C1051__complete_genome_BAC_pos,moderate,1.50,18931,50,35,2.92,[]
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP020618_1_Staphylococcus_hominis_subsp__hominis_strain_K1_chromosome__complete_genome_BAC_pos,slow,1.19,297998,50,44,1.77,[]
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE015929_1_Staphylococcus_epidermidis_ATCC_12228__complete_genome_BAC_pos,moderate,1.79,12762,50,48,1.40,[]
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP002925_1_Streptococcus_pseudopneumoniae_IS7493__complete_genome_BAC_pos,failed,1.09,1606,50,29,11.11,[-5]
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP020618_1_Staphylococcus_hominis_subsp__hominis_strain_K1_chromosome__complete_genome_BAC_pos,slow,1.22,59204,50,44,2.18,[]
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_FN568063_1_Streptococcus_mitis_B6_complete_genome__strain_B6_BAC_pos,fast,2.79,3388,50,32,3.59,[]
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE004091_2_Pseudomonas_aeruginosa_PAO1__complete_genome_BAC_pos,no growth,1.10,8782,50,50,1.00,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE006468_2_Salmonella_enterica_subsp__enterica_serovar_Typhimurium_str__LT2__complete_genome_BAC_pos,no growth,1.00,5061,50,48,1.82,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE016830_1_Enterococcus_faecalis_V583_chromosome__complete_genome_BAC_pos,moderate,1.64,3239,50,36,2.76,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE017262_2_Listeria_monocytogenes_str__4b_F2365__complete_genome_BAC_pos,no growth,1.06,3727,50,50,2.32,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AJ938182_1_Staphylococcus_aureus_RF122_complete_genome_BAC_pos,no growth,1.04,3438,50,44,2.77,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP011051_1_Bacillus_intestinalis_strain_T30__complete_genome_BAC_pos,moderate,1.31,4804,50,50,1.31,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_U00096_3_Escherichia_coli_str__K_12_substr__MG1655__complete_genome_BAC_pos,slow,1.13,4703,50,49,2.10,[]
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE004091_2_Pseudomonas_aeruginosa_PAO1__complete_genome_BAC_pos,slow,1.10,11022,50,50,0.86,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE006468_2_Salmonella_enterica_subsp__enterica_serovar_Typhimurium_str__LT2__complete_genome_BAC_pos,no growth,1.00,6297,50,48,1.98,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE016830_1_Enterococcus_faecalis_V583_chromosome__complete_genome_BAC_pos,moderate,1.48,4026,50,37,3.10,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE017262_2_Listeria_monocytogenes_str__4b_F2365__complete_genome_BAC_pos,slow,1.18,4651,50,50,1.82,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AJ938182_1_Staphylococcus_aureus_RF122_complete_genome_BAC_pos,no growth,1.05,4264,50,45,2.81,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AP008937_1_Lactobacillus_fermentum_IFO_3956_DNA__complete_genome_BAC_pos,failed,1.08,1148,50,20,19.04,"[-2, -3, -5]"
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP011051_1_Bacillus_intestinalis_strain_T30__complete_genome_BAC_pos,moderate,1.36,5981,50,50,1.17,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_U00096_3_Escherichia_coli_str__K_12_substr__MG1655__complete_genome_BAC_pos,slow,1.14,5920,50,49,2.00,[]
LisaHollstein commented 2 years ago

All params direct from https://github.com/MHH-RCUG/nf_wochenende/blob/colin_dev/nextflow.config

There are problems with growth_rate output though. It outputs a folder, which may overwrite other folders if they exist. So I'm trying to just output the .csv files, which should all have unique names.

the wochenende process doesn't work with this data... Maybe the file is corrupt? Or there are still proplems with having both Wochenende and nf_wochenende installed

colindaven commented 2 years ago

Did you change all the settings on lines 20 and 38-43 ?

The file is not corrupt, if you mean the nextflow.config ? Or which _R1.fastq input files are you using ?

If the fastq file is corrupt, you should be able to see it using


head x_R1.fastq

tail x_R1.fastq

edit - maybe you forgot to gunzip the file before using head to take a small portion of it ?

LisaHollstein commented 2 years ago

head looks like this:

@SRR11207337.1 1/1
CTAATAGTTGATAACTAAATAGAAAATATTTACTCATGTTTCACCTCCTTTCAATTTGACAATTAGATCACCAAACAATTTCCATTCATTTGGCCCAGGTGGATTTTTCCAAATTACTTGCCGACATCTTATAC
+
AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJFJ<FFAJJFFAAJF<JFAFAJJAFJFFJ-FFFJFFAFFJJJAFFFF<F7A77FA
@SRR11207337.2 2/1
TAGACTGTTCTTATTGTTAACACAAGGGAGAAGAGATGATGCGCGTACTGGTTGTAGAGGATAATGCATTATTACGCCACCACCTGAAGGTTCAGCTCCAAGATTCAGGTCACCAGGTCGATGCCAC
+
AAFFJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJAJJFJJJFJJJJJJJJJJJFJJJFJJJJJFFFJJJJFJJJJAFJF7JJFJJ-FFFJJ-FJJFJJFFJJJJAA7-<AFAFAF<
@SRR11207337.3 3/1
GGTGAGGCGTCCTCTTTGGTTGACGAAAGGGCGCTGATCGCCCGGTTGAGCTGGTTTTGCCGGGAGTAGTAGCTACTCCCGACGGCGTAACCCCCGATCAAGACGACCGCCGCC

I am now trying it with the mock fastq

colindaven commented 2 years ago

Looks ok. How about the tail x.fastq ? Did the alignment work now ? I don't think there should be problems having both versions installed, since we overwrite the env variables (only for the bash shell which is created out of nextflow, and destroyed at the end of the nextflow wochenende process) at the start of each process, eg here

nf_wochenende.nf line 329

    export WOCHENENDE_DIR=${params.WOCHENENDE_DIR}
    export HAYBALER_DIR=${params.HAYBALER_DIR}
colindaven commented 2 years ago

I saw an error in the head commands above, should be head -n 200000 not head - n 200000

You probably corrected that already though.


head -n 250000 x.fastq > mock_R1.fastq
and 
head -n 200000 x.fastq > mock_200k_R1.fastq
LisaHollstein commented 2 years ago

Looks ok. How about the tail x.fastq ? Did the alignment work now ? I don't think there should be problems having both versions installed, since we overwrite the env variables (only for the bash shell which is created out of nextflow, and destroyed at the end of the nextflow wochenende process) at the start of each process, eg here

nf_wochenende.nf line 329

    export WOCHENENDE_DIR=${params.WOCHENENDE_DIR}
    export HAYBALER_DIR=${params.HAYBALER_DIR}

True, I am just a bit clueless why it won't work...

LisaHollstein commented 2 years ago
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAFFFF
@SRR11207337.49999 49999/1
ACATTATAGCACAGCTGATTTTAGATTGTAATACTAATTTGTATTATTTTAGCTGACTAATTATCTTTCAAGTGAATAATTGTTCATAATGCTTGTTTTTACGTCTTTAAAAAGTAGAAATTTATTTCACACGCCTTTCAATATACATACC
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@SRR11207337.50000 50000/1
AAACTGGCGGGCATTGACGATAAGATCGCGCGCCATGGTCAATGCACGTCGTTCACATCCTGCGCGAAGCTCTGTATTTTCAATCTGTTTCAGCTCGGCAACGCCAGCAAAACCAATGATCCGGCGATTCATTTCCATGCCGGCATTCACC
+
AAAFFJFAJJJJAJFFJJFJJJJJJJJJJFFFFJJJJJFJJJJFJFJJJJJJFJJFJJJJJJJJJJJJJF<FAFJJJJJJJJJJJJFJJJJJJJJJFJJFAJJJJJJJJJJFJJJJJJJJJJJJJJFJJFAJJFJAAJJJFFFJJJ777FF

Tail also looks okay

colindaven commented 2 years ago

Up until now the problems with Wochenende process were

I think that was all of them, and most have since been improved.

The wochenende stage is working with other data, eg JuFo etc, right ? Bit strange.

Maybe try redownloading the fastq though it does look ok. Maybe there's file corruption on one line in the middle, bit doubtful though.

LisaHollstein commented 2 years ago

The wochenende stage is working with other data, but I also already redownloaded the fastq, so none of the listed problems seem plausible

LisaHollstein commented 2 years ago

The raspir and growth rate stages fail with data from Ilona as well...

colindaven commented 2 years ago

This is weird.

Perhaps your conda/mamba environments are now too old, if they were installed with the classic Wochenende (would be strange though). The modern nf_wochenende uses a slimmed down conda env. Then just change the reference to it in the code.

However - did you get and test the fastq data from this repo ? These are two small test fastq from a mock community with even bacterial coverage so has been working well for me for all stages.

https://github.com/colindaven/ref_testing

edit - are you using the main branch, or dev, or lisa_config? You can try with colin_devand just change the paths and cluster in the nextflow.config and config.yaml, maybe some errors crept in during the git process ?

LisaHollstein commented 2 years ago

I have a seperate conda env for nf_wochenende, so I don't use the old wochenende conda env.

(I still need to test with the mock community and I will also try with colin_dev)

LisaHollstein commented 2 years ago

Ilona tested nf_wochenende and all stages except raspir worked.

irosenboom commented 2 years ago

Growth rate worked for me, using the main branch and preterm sequencing data.

LisaHollstein commented 2 years ago

running it on conlin_dev doesn't change anything for me

LisaHollstein commented 2 years ago

I think the problem is, that pandas is missing from the nf_wochenende conda env

colindaven commented 2 years ago

Hi @LisaHollstein perhaps you need to update, pandas is listed here - and in main branch too, just checked.

https://github.dev/MHH-RCUG/nf_wochenende/blob/colin_dev/env.wochenende.minimal.yml

LisaHollstein commented 2 years ago

Yeah, what an easy solution... I wasted way to much time into this...