genome / analysis-workflows

Open workflow definitions for genomic analysis from MGI at WUSM.
MIT License
102 stars 57 forks source link

smoove.cwl error: "Failed to open -: unknown file type" #837

Open tatahin opened 4 years ago

tatahin commented 4 years ago

Good day! I keep trying to run germline_wgs.cwl with BAMs and references from example_data/exome_workflow from repository (I had to add some files and values). But it failed with error:

 [job run_smoove] Skipping Docker software container '--memory' limit despite presence of ResourceRequirement with ramMin and/or ramMax setting. Consider running with --strict-memory-limit for increased portability assurance.
INFO [job run_smoove] /home/bio/serge/analysis-workflows_TATAHIN_FORK/results/05122019_3/Temp/l3aef3eb$ docker \
    run \
    -i \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/results/05122019_3/Temp/l3aef3eb:/RJsOtd:rw \
    --volume=/tmp/qcvuiv0a:/tmp:rw \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/results/05122019_3/Temp/27qxdb0a/final.bam:/var/lib/cwl/stga84ed033-02ba-4316-9104-711b03107983/final.bam:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/results/05122019_3/Temp/5qcp_6ca/final.bam.bai:/var/lib/cwl/stga84ed033-02ba-4316-9104-711b03107983/final.bam.bai:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/results/05122019_3/Temp/5qcp_6ca/final.bai:/var/lib/cwl/stga84ed033-02ba-4316-9104-711b03107983/final.bai:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.fa:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.fa.fai:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa.fai:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.fa.amb:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa.amb:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.fa.ann:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa.ann:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.fa.bwt:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa.bwt:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.fa.pac:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa.pac:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.fa.sa:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa.sa:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.fa.index:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa.index:ro \
    --volume=/home/bio/serge/analysis-workflows_TATAHIN_FORK/example_data/exome_workflow/chr17_test.dict:/var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.dict:ro \
    --workdir=/RJsOtd \
    --read-only=true \
    --user=1000:1000 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/RJsOtd \
    --cidfile=/tmp/g060o6to/20191205182725-999418.cid \
    'brentp/smoove@sha256:9d5098d3882df1443aae36922d36b5188af1cb9b8f83dab4a9ed041a6a8019cc' \
    /usr/local/bin/smoove \
    call \
    --processes \
    4 \
    -F \
    --genotype \
    /var/lib/cwl/stga84ed033-02ba-4316-9104-711b03107983/final.bam \
    --name \
    SV \
    --fasta \
    /var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa
[smoove] 2019/12/05 15:27:26 starting with version 0.1.6
[smoove] 2019/12/05 15:27:26 calculating bam stats for 1 bams
2019/12/05 15:27:26 covmed: not enough reads to sample for bam stats
[smoove] 2019/12/05 15:27:26 done calculating bam stats
[smoove]: 2019/12/05 15:27:26 finished process: lumpy-filter (lumpy_filter -f /var/lib/cwl/stg8e064cc4-5d8a-48e7-ba03-9ebda8c10492/chr17_test.fa /var/lib/cwl/stga) in user-time:17.425ms system-time:2.354ms
[smoove] 2019/12/05 15:27:26 removed 0 alignments out of 6 (0.00%) with depth > 800 or from excluded chroms from H_NJ-HCC1395-HCC1395.disc.bam in 0 seconds
[smoove] 2019/12/05 15:27:26 removed 0 alignments out of 6 (0.00%) that were bad interchromosomals or flanked-splitters from H_NJ-HCC1395-HCC1395.disc.bam
[smoove] 2019/12/05 15:27:26 removed 0 singletons out of 6 reads (0.00%) from H_NJ-HCC1395-HCC1395.disc.bam in 0 seconds
[smoove] 2019/12/05 15:27:26 removed 0 alignments out of 0 (NaN%) with depth > 800 or from excluded chroms from H_NJ-HCC1395-HCC1395.split.bam in 0 seconds
[smoove] 2019/12/05 15:27:26 removed 0 alignments out of 0 (NaN%) that were bad interchromosomals or flanked-splitters from H_NJ-HCC1395-HCC1395.split.bam
[smoove] 2019/12/05 15:27:26 removed 0 singletons out of 0 reads (NaN%) from H_NJ-HCC1395-HCC1395.split.bam in 0 seconds
[smoove] 2019/12/05 15:27:26 starting lumpy
[smoove] 2019/12/05 15:27:26 wrote lumpy command to .//SV-lumpy-cmd.sh
[smoove] 2019/12/05 15:27:26 writing sorted, indexed file to SV-smoove.genotyped.vcf.gz
[smoove] 2019/12/05 15:27:26 excluding variants with all unknown or homozygous reference genotypes
[smoove] 2019/12/05 15:27:26 > gsort version 0.0.6
[smoove] 2019/12/05 15:27:26 missing pair end parameters:mean stdev read_length min_non_overlap

Program: ********** (v 0.2.13)
Author:  Ryan Layer (rl6sf@virginia.edu)
[smoove] 2019/12/05 15:27:26
Summary: Find structural variations in various signals.

Usage:   ********** [OPTIONS]

Options:
        -g      Genome file (defines chromosome order)
        -e      Show evidence for each call
        -w      File read windows size (default 1000000)
        -mw     minimum weight for a call
        -msw    minimum per-sample weight for a call
        -tt     trim threshold
        -x      exclude file bed file
        -t      temp file prefix, must be to a writeable directory
        -P      output probability curve for each variant
        -b      output BEDPE instead of VCF
        -sr     bam_file:<file name>,
                id:<sample name>,
                back_distance:<distance>,
                min_mapping_threshold:<mapping quality>,
                weight:<sample weight>,
                min_clip:<minimum clip length>,
                read_group:<string>
[smoove] 2019/12/05 15:27:26
        -pe     bam_file:<file name>,
                id:<sample name>,
                histo_file:<file name>,
                mean:<value>,
                stdev:<value>,
                read_length:<length>,
[smoove] 2019/12/05 15:27:26
                min_non_overlap:<length>,
                discordant_z:<z value>,
                back_distance:<distance>,
                min_mapping_threshold:<mapping quality>,
                weight:<sample weight>,
                read_group:<string>
[smoove] 2019/12/05 15:27:26
        -bedpe  bedpe_file:<bedpe file>,
                id:<sample name>,
                weight:<sample weight>

[smoove] 2019/12/05 15:27:26 2019/12/05 15:27:26 EOF
[smoove] 2019/12/05 15:27:26 Failed to open -: unknown file type
[smoove] 2019/12/05 15:27:26 wrote sorted, indexed file to SV-smoove.genotyped.vcf.gz
panic: exit status 255

Firstly we have tried to start the basecommand from schema (smoove.cwl) in smoove docker-container with final.bam from tmpfiles, but there was the same error. According to --help of smoove call the order of arguments differs from smoove.cwl's command:

Usage: smoove --name NAME --fasta FASTA [--exclude EXCLUDE] [--excludechroms EXCLUDECHROMS] [--processes PROCESSES] [--outdir OUTDIR] [--noextrafilters] [--support SUPPORT] [--genotype] [--duphold] [--removepr] BAMS [BAMS ...]

When we changed the order of arguments, error modified:

 root@b86cdc266e41:/inputs/Temp_bam1#     /usr/bin/smoove     call     --processes     4     -F     --genotype     --name     SV     --fasta     /inputs/example_data/exome_workflow/chr17_test.fa    /inputs/Temp_bam1/final.bam
[smoove] 2019/12/05 17:32:41 starting with version 0.2.5
[smoove] 2019/12/05 17:32:41 calculating bam stats for 1 bams
2019/12/05 17:32:41 covmed: not enough reads to sample for bam stats
[smoove] 2019/12/05 17:32:41 done calculating bam stats
[smoove] 2019/12/05 17:32:41 removed 0 alignments out of 0 (NaN%) with low mapq, depth > 1000, or from excluded chroms from H_NJ-HCC1395-HCC1395.split.bam in 0 seconds
[smoove] 2019/12/05 17:32:41 removed 0 alignments out of 0 (NaN%) that were bad interchromosomals or flanked-splitters from H_NJ-HCC1395-HCC1395.split.bam
[smoove] 2019/12/05 17:32:41 removed 0 alignments out of 6 (0.00%) with low mapq, depth > 1000, or from excluded chroms from H_NJ-HCC1395-HCC1395.disc.bam in 0 seconds
[smoove] 2019/12/05 17:32:41 removed 0 alignments out of 6 (0.00%) that were bad interchromosomals or flanked-splitters from H_NJ-HCC1395-HCC1395.disc.bam
[smoove] 2019/12/05 17:32:41 kept 0 putative orphans
[smoove] 2019/12/05 17:32:41 removed 0 split orphans in 0 seconds
[smoove] 2019/12/05 17:32:41 kept 0 putative orphans
[smoove] 2019/12/05 17:32:41 removed 0 discordant orphans in 0 seconds
[smoove] 2019/12/05 17:32:41 removed 0 singletons of 0 reads (NaN%) from H_NJ-HCC1395-HCC1395.split.bam in 0 seconds
[smoove] 2019/12/05 17:32:41 0 reads (NaN%) of the original 0 remain from H_NJ-HCC1395-HCC1395.split.bam
[smoove] 2019/12/05 17:32:41 removed 0 singletons and isolated interchromosomals of 6 reads (0.00%) from H_NJ-HCC1395-HCC1395.disc.bam in 0 seconds
[smoove] 2019/12/05 17:32:41 6 reads (100.00%) of the original 6 remain from H_NJ-HCC1395-HCC1395.disc.bam
[smoove] 2019/12/05 17:32:41 starting lumpy
[smoove] 2019/12/05 17:32:41 wrote lumpy command to .//SV-lumpy-cmd.sh
[smoove] 2019/12/05 17:32:41 writing sorted, indexed file to SV-smoove.genotyped.vcf.gz
[smoove] 2019/12/05 17:32:41 excluding variants with all unknown or homozygous reference genotypes
[smoove] 2019/12/05 17:32:41 > gsort version 0.0.6
[smoove] 2019/12/05 17:32:41 388
[smoove] 2019/12/05 17:32:41 0
[smoove] 2019/12/05 17:32:41 chr17      1000000
chr17   2000000
chr17   4000000
chr17   8000000
chr17   32000000
chr17   64000000
[smoove] 2019/12/05 17:32:41 2019/12/05 17:32:41 EOF
[smoove] 2019/12/05 17:32:41 Failed to open -: unknown file type
panic: exit status 255

goroutine 1 [running]:
github.com/brentp/smoove/svtyper.check(...)
        /home/brentp/go/go/src/github.com/brentp/smoove/svtyper/svtyper.go:33
github.com/brentp/smoove/svtyper.Svtyper(0xbd75a0, 0xc00000e0d0, 0x7ffe1734bd7c, 0x31, 0xc00014d5a0, 0x1, 0x1, 0xaf76c4, 0x2, 0x7ffe1734bd71, ...)
        /home/brentp/go/go/src/github.com/brentp/smoove/svtyper/svtyper.go:226 +0x17f8
github.com/brentp/smoove/lumpy.Main()
        /home/brentp/go/go/src/github.com/brentp/smoove/lumpy/lumpy.go:347 +0x44f
main.main()
        /home/brentp/go/go/src/github.com/brentp/smoove/cmd/smoove/smoove.go:124 +0x1ce

Thus the order changing helps, but it is not enough. Maybe error relates to "-" in temp files names (H_NJ-HCC1395-HCC1395). Output files of smoove, produced from test final.bam from example_data/exome_workflow

H_NJ-HCC1395-HCC1395.disc.bam
H_NJ-HCC1395-HCC1395.disc.bam.bai
H_NJ-HCC1395-HCC1395.disc.bam.orig.bam
H_NJ-HCC1395-HCC1395.histo
H_NJ-HCC1395-HCC1395.split.bam
H_NJ-HCC1395-HCC1395.split.bam.bai
H_NJ-HCC1395-HCC1395.split.bam.orig.bam
SV-lumpy-cmd.sh
tatahin commented 4 years ago

I changed "-" in headers of input bam-files and readgroups:

@RG     ID:2895499223   CN:WUGSC        LB:H_NJ_HCC1395_HCC1395-lg24-lib1       PL:Illumina     PU:H7HY2CCXX.3.ATCACGGT SM:H_NJ_HCC1395_HCC1395 

But the error repeated.

[smoove] 2019/12/09 10:10:23 Failed to open -: unknown file type

So I try to find out the real reason of an error. If you have any idea, write it in comments, please.

tatahin commented 4 years ago

I tested smoove with our aligned bam-file, relevant ref-file and base-command from smoove.cwl and it finished successfuly. So I can only conclude that bams (2895499223.bam, 2895499237.bam) are not suitable for germline_wgs.cwl testing.

golubnikova commented 4 years ago

Any news here? :)

jasonwalker80 commented 4 years ago

@apaul7 and @johnegarza , Can we sort out a solution to hosting the new example BAM file? The original question was posed by @apaul7 in his PR: https://github.com/genome/analysis-workflows/pull/885#issue-376771911

I believe the hosting of the "fixed" BAM file input was lost in code review. If it's small enough we should host directly in the repo itself. I know we've moved many inputs to a cloud bucket, but TTBOMK those are not publicly accessible yet.

serge2016 commented 4 years ago

I would appreciate it very much too!