xiaojingshi commented 3 years ago

Hi,

I ran the sv-caller and got the output with some file's path only like this:

~:cat all.txt data/bam/3/T3--N3/manta_out/survivor/manta.vcf data/bam/3/T3--N3/delly_out/survivor/delly.vcf data/bam/3/T3--N3/lumpy_out/survivor/lumpy.vcf

~:cat data/bam/3/T3--N3/manta_out/survivor/manta.vcf data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai

Do you have any idea what went wrong? Thank you.

Best, Jing.

arnikz commented 3 years ago

This is expected given that you executed the workflow in the so-called vanilla mode (i.e., echo_run set to 1 [default]), which is used for testing. To perform the actual SV calling, make sure to set this parameter to 0 (see README).

xiaojingshi commented 3 years ago

Hi,

Thank you for your timely reply. However, I did set the parameter as 0. Please help me to check the following information and let me know what went wrong. Thank you very much for your time.

$ cat analysis.yaml

run SV calling (0) or vanilla jobs (1)

echo_run: 0

select one or more callers

enable_callers:

manta
delly
lumpy
gridss

(s)ingle-sample or (p)aired-samples analysis

mode: s

filepath of the reference genome in FASTA format

genome: data/fasta/Clarireedia_homoeocarpa_HRS10.fasta

filepath of the exclusion list in BED format

exclusion_list: data/ENCFF001TDO.bed exclude_regions: 0 # use the list (1) or don't (0)

file extensions used by the workflow

file_exts: fasta: .fasta fasta_idx:

.fasta.fai # faidx index file
.fasta.bwt # BWA v0.6.x index files
.fasta.amb # |
.fasta.ann # |
.fasta.pac # |
.fasta.sa # | bam: .bam bam_idx: .bam.bai vcf: .vcf bcf: .bcf bed: .bed

CSV file with (paired) WGS samples for analysis

format: PATH,SAMPLE1,SAMPLE2

paired SAMPLE1(tumor)/SAMPLE2(normal) files used for somatic analysis while

single SAMPLE1 file used for germline or tumor-only analysis (Manta only)

samples: samples.csv

settings or requirements per SV caller

callers: manta: threads: 24 # number of threads used memory: 16384 # allocated memory (MB) tmpspace: 0 # min. temporary disk space (MB); not in use by Manta outdir: manta_out # output dir relative to PATH/SAMPLE/... (see above) tumor_only: 0 # germline (0) or tumor-only analysis (1)

delly: threads: 2 # max. 2 for paired-samples otherwise defaults to 1 memory: 8192 tmpspace: 0 # not in use outdir: delly_out sv_types:

BND # translocation
DEL # deletion
DUP # duplication
INS # insertion
INV # inversion

lumpy: threads: 1 # multi-threading not supported memory: 32768 tmpspace: 0 # e.g., 10240 outdir: lumpy_out

gridss: threads: 24 memory: 63488 tmpspace: 0 # e.g., 102400 outdir: gridss_out

postproc: survivor: threads: 1 memory: 1024 tmpspace: 0 # not in use filter: # filter SVs using a BED file (see 'exclusion_list') min_size: -1 # min. SV size (-1 to disable) max_size: -1 # max. SV size (-1 to disable) min_freq: 0 # min. allele frequency (0 or 1) min_sup: -1 # min. number of reads support: RE flag (-1 to disable)

merge:             # merge callers' output into one (union) SV set
  infile: all.txt  # list of VCF files
  max_dist: 100    # max. distance (bp) between breakpoints
  min_sup: 1       # min. number of supporting callers
  use_type: 0      # take SV type into account (1=yes or 0=no)
  use_strand: 0    # take the strands of SVs into account (1=yes or 0=no)
  use_size: 0      # estimate distance based on SV size (1=yes or 0=no)
  min_size: 0      # min. SV size (bp)
  outfile: all.vcf

$ cat samples.csv PATH,SAMPLE1 /home/xs17a/sv-callers/snakemake/data/bam/10ref,HRI11-10ref.sorted

$ ls /home/xs17a/sv-callers/snakemake/data/bam/10ref HRI11-10ref.sorted HRI11-10ref.sorted.bam HRI11-10ref.sorted.bam.bai

(sv-callers_1.1.2) $ ls HRI11-10ref.sorted all.txt all.vcf delly_out gridss_out lumpy_out manta_out (sv-callers_1.1.2) $ ls -l HRI11-10ref.sorted

-rw-rw-r-- 1 xs17a uma_geunhwa_jung 386 Mar 28 13:21 all.txt -rw-rw-r-- 1 xs17a uma_geunhwa_jung 386 Mar 28 13:21 all.vcf drwxrwxr-x 3 xs17a uma_geunhwa_jung 208 Mar 28 13:21 delly_out drwxrwxr-x 3 xs17a uma_geunhwa_jung 54 Mar 28 13:21 gridss_out drwxrwxr-x 3 xs17a uma_geunhwa_jung 53 Mar 28 13:21 lumpy_out drwxrwxr-x 3 xs17a uma_geunhwa_jung 53 Mar 28 13:21 manta_out

(sv-callers_1.1.2) $ cat all.txt /home/xs17a/sv-callers/snakemake/data/bam/10ref/HRI11-10ref.sorted/manta_out/survivor/manta.vcf /home/xs17a/sv-callers/snakemake/data/bam/10ref/HRI11-10ref.sorted/delly_out/survivor/delly.vcf /home/xs17a/sv-callers/snakemake/data/bam/10ref/HRI11-10ref.sorted/lumpy_out/survivor/lumpy.vcf /home/xs17a/sv-callers/snakemake/data/bam/10ref/HRI11-10ref.sorted/gridss_out/survivor/gridss.vcf

Best, Xiaojing.

On Mar 30, 2021, at 3:47 PM, Arnold Kuzniar @.***> wrote:

This is expected given that you executed the workflow in the so-called vanilla mode (i.e., echo_run set to 1 [default]), which is used for testing. To perform the actual SV calling, make sure to set this parameter to 0 (see README https://github.com/GooglingTheCancerGenome/sv-callers/blob/master/README.md).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GooglingTheCancerGenome/sv-callers/issues/62#issuecomment-810532074, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANF2MYEERPFCP3PC3VWM3XLTGITG5ANCNFSM42CQDZVQ.

arnikz commented 3 years ago

This is another run with different input etc. Moreover, I commented on your first example. My suggestion is to try the test data first as provided. Make sure to delete the output dir data/bam/3/T3--N3 of your previous vanilla run.

xiaojingshi commented 3 years ago

Hi,

Thank you for your reply. I did the second run for the test data. Here is the result I got:

Building DAG of jobs... Using shell: /bin/bash Provided cores: 32 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 delly_merge 5 delly_p 1 gridss_p 1 lumpy_p 1 manta_p 4 survivor_filter 1 survivor_merge 15

[Tue Mar 30 18:39:05 2021] rule delly_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/delly_out/delly-INV.bcf jobid: 14 wildcards: path=data/bam/3, tumor=T3, normal=N3, sv_type=INV threads: 2 resources: mem_mb=8192, tmp_mb=0

[Tue Mar 30 18:39:05 2021] rule gridss_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/fasta/chr22.fasta.bwt, data/fasta/chr22.fasta.amb, data/fasta/chr22.fasta.ann, data/fasta/chr22.fasta.pac, data/fasta/chr22.fasta.sa, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/gridss_out/gridss.vcf jobid: 9 wildcards: path=data/bam/3, tumor=T3, normal=N3 threads: 24 resources: mem_mb=63488, tmp_mb=0

++ dirname data/bam/3/T3--N3/delly_out/delly-INV.bcf

OUTDIR=data/bam/3/T3--N3/delly_out ++ basename data/bam/3/T3--N3/delly_out/delly-INV.bcf .bcf
PREFIX=delly-INV
OUTFILE=data/bam/3/T3--N3/delly_out/delly-INV.unfiltered.bcf
TSV=data/bam/3/T3--N3/delly_out/sample_pairs.tsv
'[' 0 -eq 1 ']'
export OMP_NUM_THREADS=2
OMP_NUM_THREADS=2

[Tue Mar 30 18:39:05 2021] rule delly_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/delly_out/delly-BND.bcf jobid: 10 wildcards: path=data/bam/3, tumor=T3, normal=N3, sv_type=BND threads: 2 resources: mem_mb=8192, tmp_mb=0

delly call -t INV -g data/fasta/chr22.fasta -o data/bam/3/T3--N3/delly_out/delly-INV.unfiltered.bcf -q 1 -s 9 -x data/ENCFF001TDO.bed data/bam/3/T3.bam data/bam/3/N3.bam ++ dirname data/bam/3/T3--N3/gridss_out/gridss.vcf /bin/bash: line 22: delly: command not found ++ get_samp_id data/bam/3/N3.bam
OUTDIR=data/bam/3/T3--N3/gridss_out +++ samtools view -H data/bam/3/N3.bam ++ basename data/bam/3/T3--N3/gridss_out/gridss.vcf .vcf +++ perl -lne 'print ${1} if /\sSM:(\S+)/' +++ head -n 1
PREFIX=gridss
OUTFILE=data/bam/3/T3--N3/gridss_out/gridss.unfiltered.vcf ++ '[' 0 -eq 0 ']' ++ echo data/bam/3/T3--N3/gridss_out
TMP=data/bam/3/T3--N3/gridss_out +++ bc

[Tue Mar 30 18:39:05 2021] rule lumpy_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/lumpy_out/lumpy.vcf jobid: 8 wildcards: path=data/bam/3, tumor=T3, normal=N3 resources: mem_mb=32768, tmp_mb=0

++ LC_ALL=C ++ printf %.f 49.60

MAX_HEAP=50 ++ dirname data/bam/3/T3--N3/delly_out/delly-BND.bcf ++ '[' 50 -gt 31 ']' ++ echo 31g
MAX_HEAP=31g
export '_JAVA_OPTIONS=-Djava.io.tmpdir=data/bam/3/T3--N3/gridss_out -Xmx31g'
_JAVA_OPTIONS='-Djava.io.tmpdir=data/bam/3/T3--N3/gridss_out -Xmx31g'
'[' 0 -eq 1 ']'
rm -fr 'data/bam/3/T3--N3/gridss_out/gridss'
OUTDIR=data/bam/3/T3--N3/delly_out ++ basename data/bam/3/T3--N3/delly_out/delly-BND.bcf .bcf
gridss gridss.CallVariants WORKER_THREADS=24 REFERENCE_SEQUENCE=data/fasta/chr22.fasta BLACKLIST=data/ENCFF001TDO.bed INPUT=data/bam/3/N3.bam INPUT=data/bam/3/T3.bam OUTPUT=data/bam/3/T3--N3/gridss_out/gridss.unfiltered.vcf ASSEMBLY=data/bam/3/T3--N3/gridss_out/gridss_assembly.bam WORKING_DIR=data/bam/3/T3--N3/gridss_out TMP_DIR=data/bam/3/T3--N3/gridss_out/gridss.8302 /bin/bash: line 22: gridss: command not found
PREFIX=delly-BND
OUTFILE=data/bam/3/T3--N3/delly_out/delly-BND.unfiltered.bcf
TSV=data/bam/3/T3--N3/delly_out/sample_pairs.tsv
'[' 0 -eq 1 ']'
export OMP_NUM_THREADS=2
OMP_NUM_THREADS=2
delly call -t BND -g data/fasta/chr22.fasta -o data/bam/3/T3--N3/delly_out/delly-BND.unfiltered.bcf -q 1 -s 9 -x data/ENCFF001TDO.bed data/bam/3/T3.bam data/bam/3/N3.bam /bin/bash: line 22: delly: command not found ++ get_samp_id data/bam/3/N3.bam +++ samtools view -H data/bam/3/N3.bam +++ perl -lne 'print ${1} if /\sSM:(\S+)/' +++ head -n 1 [Tue Mar 30 18:39:05 2021] rule delly_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/delly_out/delly-DEL.bcf jobid: 11 wildcards: path=data/bam/3, tumor=T3, normal=N3, sv_type=DEL threads: 2 resources: mem_mb=8192, tmp_mb=0

++ dirname data/bam/3/T3--N3/lumpy_out/lumpy.vcf

OUTDIR=data/bam/3/T3--N3/lumpy_out ++ basename data/bam/3/T3--N3/lumpy_out/lumpy.vcf .vcf [Tue Mar 30 18:39:05 2021] Error in rule gridss_p: jobid: 9 output: data/bam/3/T3--N3/gridss_out/gridss.vcf shell:

set -x

# if 'tmpspace' set to >0MB use TMPDIR otherwise use OUTDIR
OUTDIR="$(dirname "data/bam/3/T3--N3/gridss_out/gridss.vcf")"
PREFIX="$(basename "data/bam/3/T3--N3/gridss_out/gridss.vcf" .vcf)"
OUTFILE="${OUTDIR}/${PREFIX}.unfiltered.vcf"
TMP=$([ "0" -eq "0" ] &&
    echo "${OUTDIR}" || echo "${TMPDIR}")

# set JVM max. heap size dynamically (in GB)
# N.B. don't allocate >31G due to Compressed Oops and JDK-8029679
MAX_HEAP=$(LC_ALL=C printf "%.f" $(bc <<< "scale=2;             63488 / 1024 * .8")) # max. 80% of requested mem
MAX_HEAP=$([ "${MAX_HEAP}" -gt "31" ] && echo "31g" ||
    echo "${MAX_HEAP}g")
export _JAVA_OPTIONS="-Djava.io.tmpdir=${TMP} -Xmx${MAX_HEAP}"

# run dummy or real job
if [ "0" -eq "1" ]; then
    echo "data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/fasta/chr22.fasta.bwt data/fasta/chr22.fasta.amb data/fasta/chr22.fasta.ann data/fasta/chr22.fasta.pac data/fasta/chr22.fasta.sa data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai" "${TMP}" > "data/bam/3/T3--N3/gridss_out/gridss.vcf"
else
    # clean-up outdir prior to SV calling
    rm -fr ${OUTDIR}/*gridss* &&
    gridss gridss.CallVariants                 WORKER_THREADS=24                 REFERENCE_SEQUENCE="data/fasta/chr22.fasta"                 BLACKLIST=data/ENCFF001TDO.bed                 INPUT="data/bam/3/N3.bam"                 INPUT="data/bam/3/T3.bam"                 OUTPUT="${OUTFILE}"                 ASSEMBLY="${OUTDIR}/gridss_assembly.bam"                 WORKING_DIR="${TMP}"                 TMP_DIR="${TMP}/gridss.${RANDOM}" &&
    # somatic + SV quality filtering
    #   'normal' sample assumes index 0
    bcftools filter                 -O v `# uncompressed VCF format`                 -o "data/bam/3/T3--N3/gridss_out/gridss.vcf"                 -i "FORMAT/QUAL[0] == 0 && FILTER == '.'"                 "${OUTFILE}"
fi

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

PREFIX=lumpy
OUTFILE=data/bam/3/T3--N3/lumpy_out/lumpy.unfiltered.vcf ++ '[' 0 -eq 0 ']' ++ echo data/bam/3/T3--N3/lumpy_out
TMP=data/bam/3/T3--N3/lumpy_out
'[' 0 -eq 1 ']'

lumpyexpress -B data/bam/3/T3.bam,data/bam/3/N3.bam -x data/ENCFF001TDO.bed -o data/bam/3/T3--N3/lumpy_out/lumpy.unfiltered.vcf -m 4 -r 0 -k -T data/bam/3/T3--N3/lumpy_out/lumpy.11488 /bin/bash: line 13: lumpyexpress: command not found ++ dirname data/bam/3/T3--N3/delly_out/delly-DEL.bcf [Tue Mar 30 18:39:05 2021] Error in rule lumpy_p: jobid: 8 output: data/bam/3/T3--N3/lumpy_out/lumpy.vcf shell:

set -x

# if 'tmpspace' set to >0MB use TMPDIR otherwise use OUTDIR
OUTDIR="$(dirname "data/bam/3/T3--N3/lumpy_out/lumpy.vcf")"
PREFIX="$(basename "data/bam/3/T3--N3/lumpy_out/lumpy.vcf" .vcf)"
OUTFILE="${OUTDIR}/${PREFIX}.unfiltered.vcf"
TMP=$([ "0" -eq "0" ] &&
    echo "${OUTDIR}" || echo "${TMPDIR}")

# run dummy or real job
if [ "0" -eq "1" ]; then
    echo "data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai" "${TMP}" > "data/bam/3/T3--N3/lumpy_out/lumpy.vcf"
else
    lumpyexpress                 -B "data/bam/3/T3.bam","data/bam/3/N3.bam"                 -x "data/ENCFF001TDO.bed"                 -o "${OUTFILE}"                 -m 4 `# min. sample weight`                 -r 0 `# trim threshold`                 -k `# keep tmp files`                 -T "${TMP}/lumpy.${RANDOM}" &&
    # somatic + SV quality filtering
    #   'normal' sample assumes index 1
    bcftools filter                 -O v `# uncompressed VCF format`                 -o "data/bam/3/T3--N3/lumpy_out/lumpy.vcf"                 -i "FORMAT/SU[1] == 0 && FILTER == '.'"                 "${OUTFILE}"
fi

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

OUTDIR=data/bam/3/T3--N3/delly_out ++ basename data/bam/3/T3--N3/delly_out/delly-DEL.bcf .bcf
PREFIX=delly-DEL
OUTFILE=data/bam/3/T3--N3/delly_out/delly-DEL.unfiltered.bcf
TSV=data/bam/3/T3--N3/delly_out/sample_pairs.tsv
'[' 0 -eq 1 ']'
export OMP_NUM_THREADS=2
OMP_NUM_THREADS=2
delly call -t DEL -g data/fasta/chr22.fasta -o data/bam/3/T3--N3/delly_out/delly-DEL.unfiltered.bcf -q 1 -s 9 -x data/ENCFF001TDO.bed data/bam/3/T3.bam data/bam/3/N3.bam /bin/bash: line 22: delly: command not found ++ get_samp_id data/bam/3/N3.bam +++ samtools view -H data/bam/3/N3.bam +++ perl -lne 'print ${1} if /\sSM:(\S+)/' +++ head -n 1 ++ echo NA12878_2 ++ echo NA12878_2
CID=NA12878_2

CID=NA12878_2 /bin/bash: line 29: TID: unbound variable /bin/bash: line 29: TID: unbound variable [Tue Mar 30 18:39:05 2021] Error in rule delly_p: jobid: 10 output: data/bam/3/T3--N3/delly_out/delly-BND.bcf shell:

set -x

OUTDIR="$(dirname "data/bam/3/T3--N3/delly_out/delly-BND.bcf")"
PREFIX="$(basename "data/bam/3/T3--N3/delly_out/delly-BND.bcf" .bcf)"
OUTFILE="${OUTDIR}/${PREFIX}.unfiltered.bcf"
TSV="${OUTDIR}/sample_pairs.tsv"

# fetch sample ID from a BAM file
function get_samp_id() {
    echo "$(samtools view -H ${1} |                    perl -lne 'print ${1} if /\sSM:(\S+)/' |                    head -n 1)"
}

# run dummy or real job
if [ "0" -eq "1" ]; then
    echo "data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai" > "data/bam/3/T3--N3/delly_out/delly-BND.bcf"
else
    # use OpenMP for threaded jobs
    export OMP_NUM_THREADS=2
    #export OMP_PROC_BIND=true
    #export OMP_PLACES=threads

    # SV calling
    delly call                 -t "BND"                 -g "data/fasta/chr22.fasta"                 -o "${OUTFILE}"                 -q 1 `# min.paired-end mapping quality`                 -s 9 `# insert size cutoff, DELs only`                 -x "data/ENCFF001TDO.bed"                 "data/bam/3/T3.bam" "data/bam/3/N3.bam" &&
    # somatic + SV quality filtering
    #   create sample list
    TID=$(get_samp_id "data/bam/3/T3.bam")
    CID=$(get_samp_id "data/bam/3/N3.bam")
    printf "${TID}  tumor

${CID} control " > ${TSV} && delly filter -f somatic -p -s "${TSV}" -o "data/bam/3/T3--N3/delly_out/delly-BND.bcf" "${OUTFILE}" fi

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Tue Mar 30 18:39:05 2021] Error in rule delly_p: jobid: 14 output: data/bam/3/T3--N3/delly_out/delly-INV.bcf shell:

    set -x

    OUTDIR="$(dirname "data/bam/3/T3--N3/delly_out/delly-INV.bcf")"
    PREFIX="$(basename "data/bam/3/T3--N3/delly_out/delly-INV.bcf" .bcf)"
    OUTFILE="${OUTDIR}/${PREFIX}.unfiltered.bcf"
    TSV="${OUTDIR}/sample_pairs.tsv"

    # fetch sample ID from a BAM file
    function get_samp_id() {
        echo "$(samtools view -H ${1} |                    perl -lne 'print ${1} if /\sSM:(\S+)/' |                    head -n 1)"
    }

    # run dummy or real job
    if [ "0" -eq "1" ]; then
        echo "data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai" > "data/bam/3/T3--N3/delly_out/delly-INV.bcf"
    else
        # use OpenMP for threaded jobs
        export OMP_NUM_THREADS=2
        #export OMP_PROC_BIND=true
        #export OMP_PLACES=threads

        # SV calling
        delly call                 -t "INV"                 -g "data/fasta/chr22.fasta"                 -o "${OUTFILE}"                 -q 1 `# min.paired-end mapping quality`                 -s 9 `# insert size cutoff, DELs only`                 -x "data/ENCFF001TDO.bed"                 "data/bam/3/T3.bam" "data/bam/3/N3.bam" &&
        # somatic + SV quality filtering
        #   create sample list
        TID=$(get_samp_id "data/bam/3/T3.bam")
        CID=$(get_samp_id "data/bam/3/N3.bam")
        printf "${TID}  tumor

${CID} control " > ${TSV} && delly filter -f somatic -p -s "${TSV}" -o "data/bam/3/T3--N3/delly_out/delly-INV.bcf" "${OUTFILE}" fi

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

++ echo NA12878_2

CID=NA12878_2 /bin/bash: line 29: TID: unbound variable [Tue Mar 30 18:39:05 2021] Error in rule delly_p: jobid: 11 output: data/bam/3/T3--N3/delly_out/delly-DEL.bcf shell:

set -x

OUTDIR="$(dirname "data/bam/3/T3--N3/delly_out/delly-DEL.bcf")"
PREFIX="$(basename "data/bam/3/T3--N3/delly_out/delly-DEL.bcf" .bcf)"
OUTFILE="${OUTDIR}/${PREFIX}.unfiltered.bcf"
TSV="${OUTDIR}/sample_pairs.tsv"

# fetch sample ID from a BAM file
function get_samp_id() {
    echo "$(samtools view -H ${1} |                    perl -lne 'print ${1} if /\sSM:(\S+)/' |                    head -n 1)"
}

# run dummy or real job
if [ "0" -eq "1" ]; then
    echo "data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai" > "data/bam/3/T3--N3/delly_out/delly-DEL.bcf"
else
    # use OpenMP for threaded jobs
    export OMP_NUM_THREADS=2
    #export OMP_PROC_BIND=true
    #export OMP_PLACES=threads

    # SV calling
    delly call                 -t "DEL"                 -g "data/fasta/chr22.fasta"                 -o "${OUTFILE}"                 -q 1 `# min.paired-end mapping quality`                 -s 9 `# insert size cutoff, DELs only`                 -x "data/ENCFF001TDO.bed"                 "data/bam/3/T3.bam" "data/bam/3/N3.bam" &&
    # somatic + SV quality filtering
    #   create sample list
    TID=$(get_samp_id "data/bam/3/T3.bam")
    CID=$(get_samp_id "data/bam/3/N3.bam")
    printf "${TID}  tumor

${CID} control " > ${TSV} && delly filter -f somatic -p -s "${TSV}" -o "data/bam/3/T3--N3/delly_out/delly-DEL.bcf" "${OUTFILE}" fi

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/xs17a/sv-sample/snakemake/.snakemake/log/2021-03-30T183905.553184.snakemake.log

Do you have any idea what went wrong? Thank you very much.

Best, Xiaojing.

On Mar 30, 2021, at 4:15 PM, Arnold Kuzniar @.***> wrote:

This is another run with different input etc. Moreover, I commented on your first example. My suggestion is to try the test data first as provided. Make sure to delete the output dir data/bam/3/T3--N3 of your previous vanilla run.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GooglingTheCancerGenome/sv-callers/issues/62#issuecomment-810548024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANF2MYFBQQZS66MKBG7D6HTTGIWOJANCNFSM42CQDZVQ.

arnikz commented 3 years ago

I can't reproduce your issue without additional details. All is running fine according to the CI tests here. Could you paste the command-line use to execute the workflow (e.g., snakemake -C echo_run=0 --use-conda) ? And are you using an HPC cluster?

xiaojingshi commented 3 years ago

I can't reproduce your issue without additional details. All is running fine according to the CI tests here. Could you paste the command-line use to execute the workflow (e.g., snakemake -C echo_run=0 --use-conda) ? And are you using an HPC cluster?

Hi,

Sorry for the late response. Yes, I am using an HPC cluster. But I did not submit it as a job. I only used the command as: snakemake. And it gave the result above. Thanks.

arnikz commented 3 years ago

First, remove the output dir. rm -r data/bam/3/T3--N3.

SV calling (without submitting jobs to a queue): snakemake -C echo_run=0 --use-conda

SV calling on an Slurm/GE-based cluster:

SCH=slurm   # or gridengine
snakemake -C echo_run=0 mode=p enable_callers="['manta','delly','lumpy','gridss']" --use-conda --latency-wait 30 --jobs 14 \
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 15 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&

as described here.

xiaojingshi commented 3 years ago

Thanks for the reply.

The sv-caller is installed as a condo environment. If I use the command contains ‘—use-conda’, it looks like this:

(sv-callers_1.1.2) @.*** snakemake]$ snakemake -C echo_run=0 --use-conda Building DAG of jobs... Creating conda environment environment.yaml... Downloading and installing remote packages. CreateCondaEnvironmentException: Could not create conda environment from /home/xs17a/sv-sample/snakemake/rules/../environment.yaml: Collecting package metadata (repodata.json): ...working... failed

CondaMemoryError: The conda process ran out of memory. Increase system memory and/or try again.

File "/share/pkg/condas/2018-05-11/envs/sv-callers_1.1.2/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 312, in create

Otherwise, if I use the snakmmake ONLY, it shows like the previous result.

By the way, do you have any idea how to run it based on LSF cluster? Thank you.

On Apr 7, 2021, at 1:32 PM, Arnold Kuzniar @.***> wrote:

First, remove the output dir. rm -r data/bam/3/T3--N3.

SV calling (without submitting jobs to a queue): snakemake -C echo_run=0 --use-conda

SV calling on an Slurm/GE-based cluster:

SCH=slurm # or gridengine snakemake -C echo_run=0 mode=p enable_callers="['manta','delly','lumpy','gridss']" --use-conda --latency-wait 30 --jobs 14 \ --cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 15 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log& as described here https://github.com/GooglingTheCancerGenome/sv-callers/blob/master/README.md.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GooglingTheCancerGenome/sv-callers/issues/62#issuecomment-815094260, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANF2MYHYTB4IZQVG6JNZ2UDTHSJLNANCNFSM42CQDZVQ.

arnikz commented 3 years ago

By the way, do you have any idea how to run it based on LSF cluster? Thank you.

See #48.

xiaojingshi commented 3 years ago

Thanks.

GooglingTheCancerGenome / sv-callers

empty output #62

run SV calling (0) or vanilla jobs (1)

select one or more callers

(s)ingle-sample or (p)aired-samples analysis

filepath of the reference genome in FASTA format

filepath of the exclusion list in BED format

file extensions used by the workflow

CSV file with (paired) WGS samples for analysis

format: PATH,SAMPLE1,SAMPLE2

paired SAMPLE1(tumor)/SAMPLE2(normal) files used for somatic analysis while

single SAMPLE1 file used for germline or tumor-only analysis (Manta only)

settings or requirements per SV caller