GooglingTheCancerGenome / sv-callers

Snakemake-based workflow for detecting structural variants in genomic data
https://research-software.nl/software/sv-callers
Apache License 2.0
77 stars 35 forks source link

empty output #62

Closed xiaojingshi closed 3 years ago

xiaojingshi commented 3 years ago

Hi,

I ran the sv-caller and got the output with some file's path only like this:

~:cat all.txt data/bam/3/T3--N3/manta_out/survivor/manta.vcf data/bam/3/T3--N3/delly_out/survivor/delly.vcf data/bam/3/T3--N3/lumpy_out/survivor/lumpy.vcf

~:cat data/bam/3/T3--N3/manta_out/survivor/manta.vcf data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai

Do you have any idea what went wrong? Thank you.

Best, Jing.

arnikz commented 3 years ago

This is expected given that you executed the workflow in the so-called vanilla mode (i.e., echo_run set to 1 [default]), which is used for testing. To perform the actual SV calling, make sure to set this parameter to 0 (see README).

xiaojingshi commented 3 years ago

Hi,

Thank you for your timely reply. However, I did set the parameter as 0. Please help me to check the following information and let me know what went wrong. Thank you very much for your time.

$ cat analysis.yaml

run SV calling (0) or vanilla jobs (1)

echo_run: 0

select one or more callers

enable_callers:

(s)ingle-sample or (p)aired-samples analysis

mode: s

filepath of the reference genome in FASTA format

genome: data/fasta/Clarireedia_homoeocarpa_HRS10.fasta

filepath of the exclusion list in BED format

exclusion_list: data/ENCFF001TDO.bed exclude_regions: 0 # use the list (1) or don't (0)

file extensions used by the workflow

file_exts: fasta: .fasta fasta_idx:

CSV file with (paired) WGS samples for analysis

format: PATH,SAMPLE1,SAMPLE2

paired SAMPLE1(tumor)/SAMPLE2(normal) files used for somatic analysis while

single SAMPLE1 file used for germline or tumor-only analysis (Manta only)

samples: samples.csv

settings or requirements per SV caller

callers: manta: threads: 24 # number of threads used memory: 16384 # allocated memory (MB) tmpspace: 0 # min. temporary disk space (MB); not in use by Manta outdir: manta_out # output dir relative to PATH/SAMPLE/... (see above) tumor_only: 0 # germline (0) or tumor-only analysis (1)

delly: threads: 2 # max. 2 for paired-samples otherwise defaults to 1 memory: 8192 tmpspace: 0 # not in use outdir: delly_out sv_types:

postproc: survivor: threads: 1 memory: 1024 tmpspace: 0 # not in use filter: # filter SVs using a BED file (see 'exclusion_list') min_size: -1 # min. SV size (-1 to disable) max_size: -1 # max. SV size (-1 to disable) min_freq: 0 # min. allele frequency (0 or 1) min_sup: -1 # min. number of reads support: RE flag (-1 to disable)

merge:             # merge callers' output into one (union) SV set
  infile: all.txt  # list of VCF files
  max_dist: 100    # max. distance (bp) between breakpoints
  min_sup: 1       # min. number of supporting callers
  use_type: 0      # take SV type into account (1=yes or 0=no)
  use_strand: 0    # take the strands of SVs into account (1=yes or 0=no)
  use_size: 0      # estimate distance based on SV size (1=yes or 0=no)
  min_size: 0      # min. SV size (bp)
  outfile: all.vcf

$ cat samples.csv PATH,SAMPLE1 /home/xs17a/sv-callers/snakemake/data/bam/10ref,HRI11-10ref.sorted

$ ls /home/xs17a/sv-callers/snakemake/data/bam/10ref HRI11-10ref.sorted HRI11-10ref.sorted.bam HRI11-10ref.sorted.bam.bai

(sv-callers_1.1.2) $ ls HRI11-10ref.sorted all.txt all.vcf delly_out gridss_out lumpy_out manta_out (sv-callers_1.1.2) $ ls -l HRI11-10ref.sorted

-rw-rw-r-- 1 xs17a uma_geunhwa_jung 386 Mar 28 13:21 all.txt -rw-rw-r-- 1 xs17a uma_geunhwa_jung 386 Mar 28 13:21 all.vcf drwxrwxr-x 3 xs17a uma_geunhwa_jung 208 Mar 28 13:21 delly_out drwxrwxr-x 3 xs17a uma_geunhwa_jung 54 Mar 28 13:21 gridss_out drwxrwxr-x 3 xs17a uma_geunhwa_jung 53 Mar 28 13:21 lumpy_out drwxrwxr-x 3 xs17a uma_geunhwa_jung 53 Mar 28 13:21 manta_out

(sv-callers_1.1.2) $ cat all.txt /home/xs17a/sv-callers/snakemake/data/bam/10ref/HRI11-10ref.sorted/manta_out/survivor/manta.vcf /home/xs17a/sv-callers/snakemake/data/bam/10ref/HRI11-10ref.sorted/delly_out/survivor/delly.vcf /home/xs17a/sv-callers/snakemake/data/bam/10ref/HRI11-10ref.sorted/lumpy_out/survivor/lumpy.vcf /home/xs17a/sv-callers/snakemake/data/bam/10ref/HRI11-10ref.sorted/gridss_out/survivor/gridss.vcf

Best, Xiaojing.

On Mar 30, 2021, at 3:47 PM, Arnold Kuzniar @.***> wrote:

This is expected given that you executed the workflow in the so-called vanilla mode (i.e., echo_run set to 1 [default]), which is used for testing. To perform the actual SV calling, make sure to set this parameter to 0 (see README https://github.com/GooglingTheCancerGenome/sv-callers/blob/master/README.md).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GooglingTheCancerGenome/sv-callers/issues/62#issuecomment-810532074, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANF2MYEERPFCP3PC3VWM3XLTGITG5ANCNFSM42CQDZVQ.

arnikz commented 3 years ago

This is another run with different input etc. Moreover, I commented on your first example. My suggestion is to try the test data first as provided. Make sure to delete the output dir data/bam/3/T3--N3 of your previous vanilla run.

xiaojingshi commented 3 years ago

Hi,

Thank you for your reply. I did the second run for the test data. Here is the result I got:

Building DAG of jobs... Using shell: /bin/bash Provided cores: 32 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 delly_merge 5 delly_p 1 gridss_p 1 lumpy_p 1 manta_p 4 survivor_filter 1 survivor_merge 15

[Tue Mar 30 18:39:05 2021] rule delly_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/delly_out/delly-INV.bcf jobid: 14 wildcards: path=data/bam/3, tumor=T3, normal=N3, sv_type=INV threads: 2 resources: mem_mb=8192, tmp_mb=0

[Tue Mar 30 18:39:05 2021] rule gridss_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/fasta/chr22.fasta.bwt, data/fasta/chr22.fasta.amb, data/fasta/chr22.fasta.ann, data/fasta/chr22.fasta.pac, data/fasta/chr22.fasta.sa, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/gridss_out/gridss.vcf jobid: 9 wildcards: path=data/bam/3, tumor=T3, normal=N3 threads: 24 resources: mem_mb=63488, tmp_mb=0

++ dirname data/bam/3/T3--N3/delly_out/delly-INV.bcf

[Tue Mar 30 18:39:05 2021] rule delly_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/delly_out/delly-BND.bcf jobid: 10 wildcards: path=data/bam/3, tumor=T3, normal=N3, sv_type=BND threads: 2 resources: mem_mb=8192, tmp_mb=0

[Tue Mar 30 18:39:05 2021] rule lumpy_p: input: data/fasta/chr22.fasta, data/fasta/chr22.fasta.fai, data/bam/3/T3.bam, data/bam/3/T3.bam.bai, data/bam/3/N3.bam, data/bam/3/N3.bam.bai output: data/bam/3/T3--N3/lumpy_out/lumpy.vcf jobid: 8 wildcards: path=data/bam/3, tumor=T3, normal=N3 resources: mem_mb=32768, tmp_mb=0

++ LC_ALL=C ++ printf %.f 49.60

++ dirname data/bam/3/T3--N3/lumpy_out/lumpy.vcf

[Tue Mar 30 18:39:05 2021] Error in rule delly_p: jobid: 14 output: data/bam/3/T3--N3/delly_out/delly-INV.bcf shell:

    set -x

    OUTDIR="$(dirname "data/bam/3/T3--N3/delly_out/delly-INV.bcf")"
    PREFIX="$(basename "data/bam/3/T3--N3/delly_out/delly-INV.bcf" .bcf)"
    OUTFILE="${OUTDIR}/${PREFIX}.unfiltered.bcf"
    TSV="${OUTDIR}/sample_pairs.tsv"

    # fetch sample ID from a BAM file
    function get_samp_id() {
        echo "$(samtools view -H ${1} |                    perl -lne 'print ${1} if /\sSM:(\S+)/' |                    head -n 1)"
    }

    # run dummy or real job
    if [ "0" -eq "1" ]; then
        echo "data/fasta/chr22.fasta data/fasta/chr22.fasta.fai data/bam/3/T3.bam data/bam/3/T3.bam.bai data/bam/3/N3.bam data/bam/3/N3.bam.bai" > "data/bam/3/T3--N3/delly_out/delly-INV.bcf"
    else
        # use OpenMP for threaded jobs
        export OMP_NUM_THREADS=2
        #export OMP_PROC_BIND=true
        #export OMP_PLACES=threads

        # SV calling
        delly call                 -t "INV"                 -g "data/fasta/chr22.fasta"                 -o "${OUTFILE}"                 -q 1 `# min.paired-end mapping quality`                 -s 9 `# insert size cutoff, DELs only`                 -x "data/ENCFF001TDO.bed"                 "data/bam/3/T3.bam" "data/bam/3/N3.bam" &&
        # somatic + SV quality filtering
        #   create sample list
        TID=$(get_samp_id "data/bam/3/T3.bam")
        CID=$(get_samp_id "data/bam/3/N3.bam")
        printf "${TID}  tumor

${CID} control " > ${TSV} && delly filter -f somatic -p -s "${TSV}" -o "data/bam/3/T3--N3/delly_out/delly-INV.bcf" "${OUTFILE}" fi

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

++ echo NA12878_2

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/xs17a/sv-sample/snakemake/.snakemake/log/2021-03-30T183905.553184.snakemake.log

Do you have any idea what went wrong? Thank you very much.

Best, Xiaojing.

On Mar 30, 2021, at 4:15 PM, Arnold Kuzniar @.***> wrote:

This is another run with different input etc. Moreover, I commented on your first example. My suggestion is to try the test data first as provided. Make sure to delete the output dir data/bam/3/T3--N3 of your previous vanilla run.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GooglingTheCancerGenome/sv-callers/issues/62#issuecomment-810548024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANF2MYFBQQZS66MKBG7D6HTTGIWOJANCNFSM42CQDZVQ.

arnikz commented 3 years ago

I can't reproduce your issue without additional details. All is running fine according to the CI tests here. Could you paste the command-line use to execute the workflow (e.g., snakemake -C echo_run=0 --use-conda) ? And are you using an HPC cluster?

xiaojingshi commented 3 years ago

I can't reproduce your issue without additional details. All is running fine according to the CI tests here. Could you paste the command-line use to execute the workflow (e.g., snakemake -C echo_run=0 --use-conda) ? And are you using an HPC cluster?

Hi,

Sorry for the late response. Yes, I am using an HPC cluster. But I did not submit it as a job. I only used the command as: snakemake. And it gave the result above. Thanks.

arnikz commented 3 years ago

First, remove the output dir. rm -r data/bam/3/T3--N3.

SV calling (without submitting jobs to a queue): snakemake -C echo_run=0 --use-conda

SV calling on an Slurm/GE-based cluster:

SCH=slurm   # or gridengine
snakemake -C echo_run=0 mode=p enable_callers="['manta','delly','lumpy','gridss']" --use-conda --latency-wait 30 --jobs 14 \
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 15 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&

as described here.

xiaojingshi commented 3 years ago

Thanks for the reply.

The sv-caller is installed as a condo environment. If I use the command contains ‘—use-conda’, it looks like this:

(sv-callers_1.1.2) @.*** snakemake]$ snakemake -C echo_run=0 --use-conda Building DAG of jobs... Creating conda environment environment.yaml... Downloading and installing remote packages. CreateCondaEnvironmentException: Could not create conda environment from /home/xs17a/sv-sample/snakemake/rules/../environment.yaml: Collecting package metadata (repodata.json): ...working... failed

CondaMemoryError: The conda process ran out of memory. Increase system memory and/or try again.

File "/share/pkg/condas/2018-05-11/envs/sv-callers_1.1.2/lib/python3.9/site-packages/snakemake/deployment/conda.py", line 312, in create

Otherwise, if I use the snakmmake ONLY, it shows like the previous result.

By the way, do you have any idea how to run it based on LSF cluster? Thank you.

On Apr 7, 2021, at 1:32 PM, Arnold Kuzniar @.***> wrote:

First, remove the output dir. rm -r data/bam/3/T3--N3.

SV calling (without submitting jobs to a queue): snakemake -C echo_run=0 --use-conda

SV calling on an Slurm/GE-based cluster:

SCH=slurm # or gridengine snakemake -C echo_run=0 mode=p enable_callers="['manta','delly','lumpy','gridss']" --use-conda --latency-wait 30 --jobs 14 \ --cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 15 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log& as described here https://github.com/GooglingTheCancerGenome/sv-callers/blob/master/README.md.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GooglingTheCancerGenome/sv-callers/issues/62#issuecomment-815094260, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANF2MYHYTB4IZQVG6JNZ2UDTHSJLNANCNFSM42CQDZVQ.

arnikz commented 3 years ago

By the way, do you have any idea how to run it based on LSF cluster? Thank you.

See #48.

xiaojingshi commented 3 years ago

Thanks.