bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

bcbio-prioritize error: java.lang.RuntimeException: EOF while reading string #3454

Open chatchawit opened 3 years ago

chatchawit commented 3 years ago

Version info

To Reproduce Exact bcbio command you have used:

bcbio-prioritize -Xms750m -Xmx25480m known -i /san/test/wes-tn/work/structural/LU147-01/lumpy/LU147-11-LU147-svs-smoove.genotyped-duphold-filter-effects.vcf.gz -o /san/tmp/LU147-11-lumpy-prioritize.vcf.gz -k /san/bcbio/genomes/Hsapiens/hg38/coverage/prioritize/cancer/civic-2018-12-27.bed.gz

Your sample configuration file:

details:
- algorithm:
    aligner: bwa
    coverage_interval: regional
    exclude_regions: [lcr]
    hlacaller: optitype
    mark_duplicates: true
    min_allele_fraction: 2
    nomap_split_targets: 100
    remove_lcr: true
    recalibrate: false
    save_diskspace: true
    svcaller: [gatk-cnv, lumpy, manta]
    svprioritize: cancer/civic
    svvalidate:
      DEL: /san/data/dream-syn3-crossmap/truth_DEL.bed
      DUP: /san/data/dream-syn3-crossmap/truth_DUP.bed
      INV: /san/data/dream-syn3-crossmap/truth_INV.bed
    validate: /san/data/dream-syn3-crossmap/truth_small_variants.vcf.gz
    validate_regions: /san/data/dream-syn3-crossmap/truth_regions.bed
    variantcaller:
      somatic: [mutect2]
      germline: [gatk-haplotype]
    variant_regions: /san/data/lib/S07604624_Covered.bed
  analysis: variant2
  description: LU147-11
  files:
    - /san/data/fq/LU147-11_L4_1.fq.gz
    - /san/data/fq/LU147-11_L4_2.fq.gz
  genome_build: hg38
  metadata:
    batch: LU147
    phenotype: normal
- algorithm:
    aligner: bwa
    coverage_interval: regional
    exclude_regions: [lcr]
    hlacaller: optitype
    mark_duplicates: true
    min_allele_fraction: 2
    nomap_split_targets: 100
    remove_lcr: true
    recalibrate: false
    save_diskspace: true
    svcaller: [gatk-cnv, lumpy, manta]
    svprioritize: cancer/civic
    svvalidate:
      DEL: /san/data/dream-syn3-crossmap/truth_DEL.bed
      DUP: /san/data/dream-syn3-crossmap/truth_DUP.bed
      INV: /san/data/dream-syn3-crossmap/truth_INV.bed
    validate: /san/data/dream-syn3-crossmap/truth_small_variants.vcf.gz
    validate_regions: /san/data/dream-syn3-crossmap/truth_regions.bed
    variantcaller:
      somatic: [mutect2]
      germline: [gatk-haplotype]
    variant_regions: /san/data/lib/S07604624_Covered.bed
  analysis: variant2
  description: LU147-01
  files:
    - /san/data/fq/LU147-01_L1_1.fq.gz
    - /san/data/fq/LU147-01_L1_2.fq.gz
  genome_build: hg38
  metadata:
    batch: LU147
    phenotype: tumor
upload:
  dir: /san/test/wes-tn/final

Observed behavior Error message or bcbio output:

2021-04-02 14:07:03 bioinformatics ERROR [bcbio.prioritize.main] -
                   bcbio.prioritize.main.main
                                          ...
                  bcbio.prioritize.main/-main       main.clj:   35
               bcbio.prioritize.main/-main/fn       main.clj:   36
                           clojure.core/apply       core.clj:  630
                                          ...
                 bcbio.prioritize.known/-main      known.clj:  236
            bcbio.prioritize.known/prioritize      known.clj:  206
                                          ...
                    bcbio.prioritize.known/fn      known.clj:  188
      bcbio.prioritize.known/parse-intersects      known.clj:  161
                            clojure.core/into       core.clj: 6600
                          clojure.core/reduce       core.clj: 6519
                  clojure.core.protocols/fn/G  protocols.clj:   13
                    clojure.core.protocols/fn  protocols.clj:  101
            clojure.core.protocols/seq-reduce  protocols.clj:   30
                             clojure.core/seq       core.clj:  137
                                          ...
                          clojure.core/map/fn       core.clj: 2624
          bcbio.prioritize.known/combine-hits      known.clj:  123
                          clojure.core/reduce       core.clj: 6519
                  clojure.core.protocols/fn/G  protocols.clj:   13
                    clojure.core.protocols/fn  protocols.clj:  101
            clojure.core.protocols/seq-reduce  protocols.clj:   30
                             clojure.core/seq       core.clj:  137
                                          ...
                          clojure.core/map/fn       core.clj: 2616
                             clojure.core/seq       core.clj:  137
                                          ...
                          clojure.core/map/fn       core.clj: 2624
       bcbio.prioritize.known/combine-hits/fn      known.clj:  121
bcbio.prioritize.known/combine-hits/parse-hit      known.clj:  101
                      clojure.edn/read-string        edn.clj:   45
                      clojure.edn/read-string        edn.clj:   46
                                          ...
java.lang.RuntimeException: EOF while reading string

Expected behavior My aim is to run the Workflow1 - T/N. bcbio-prioritize is a fairly simple command line. It should work. I gave the input files for debugging.

Input files LU147-11-LU147-svs-smoove.genotyped-duphold-filter-effects.vcf.gz (https://drive.google.com/file/d/1Xmpw9ZbxGKCn6nQdlWNdz2K6EwGhi2qf/view?usp=sharing) civic-2018-12-27.bed.gz (https://drive.google.com/file/d/1fcBu5P3el-cPM9u0eCPwrfL_MjkUiSla/view?usp=sharing)

Log files Please attach (10MB max): bcbio-nextgen.log, bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log.

Additional context I've installed this version of java (/usr/bin/java -version).

openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)

However, it should not interfere with bcbio's java (/san/bcbio/anaconda/bin/java -version).

openjdk version "1.8.0_265"
OpenJDK Runtime Environment (Zulu 8.48.0.53-CA-linux64) (build 1.8.0_265-b11)
OpenJDK 64-Bit Server VM (Zulu 8.48.0.53-CA-linux64) (build 25.265-b11, mixed mode)

bcbio-nextgen-debug.log bcbio-nextgen.log bcbio-nextgen-commands.log

naumenko-sa commented 3 years ago

Hi @chatchawit !

Thanks for the detailed report! There is almost no chance that I would fix the closure issue, but please send LU147-11-LU147-svs-smoove.genotyped-duphold-filter-effects.vcf.gz, the both links above are the same bed file.

I suspect that lumpy generated a vcf record that breaks bcbio-prioritize. I think that a simple workaround would be just to remove lumpy from svcaller.

Sergey

chatchawit commented 3 years ago

Here are the files.

civic-2018-12-27.bed.gz https://drive.google.com/file/d/1Xmpw9ZbxGKCn6nQdlWNdz2K6EwGhi2qf/view?usp=sharing

LU147-11-LU147-svs-smoove.genotyped-duphold-filter-effects.vcf https://drive.google.com/file/d/1fcBu5P3el-cPM9u0eCPwrfL_MjkUiSla/view?usp=sharing

Now I can run the workflow T/N pair (variant caller) -> PON (GATK CNV) -> structual variant (gatk-cnv, lumpy, manta). I split into 3 steps (3 YAML files), not a single YAML as shown above. However, I have to removed "svprioritize". Could the error result from "Clojure" setting? I found that the command "bcbio-prioritize" encountered the same error for every version. I downloaded the binary files of all versions from GitHub. In the latest version, I cannot compile the source by "make".

Note that I put gatk-cnv, lumpy, and manta together. It works together. lumpy and manta is ok. The problem is at "svprioritize".

Thank you.

naumenko-sa commented 3 years ago

Hi @chatchawit !

This command runs OK for me and produces a prioritized vcf file.

$ bcbio-prioritize version
bcbio.prioritize 0.0.8
$ java -Xmx1g -version
openjdk version "1.8.0_192"
OpenJDK Runtime Environment (Zulu 8.33.0.1-linux64) (build 1.8.0_192-b01)
OpenJDK 64-Bit Server VM (Zulu 8.33.0.1-linux64) (build 25.192-b01, mixed mode)
$ which java
/n/app/bcbio/dev/anaconda/bin/java

Make sure your java is bcbio's java?

Sergey

chatchawit commented 3 years ago

I've found the cause. Please read below.

I copied the executable "bcbio-prioritize" to run on another machine (VM, Ubuntu same version). It worked succesfully. I installed bcftools and bedtools manually because of no bcbio was installed.

sudo apt install bcftools
sudo apt install bedtools

The machine that "bcbio-prioritize" failed.

$ java -version
openjdk version "1.8.0_265"
OpenJDK Runtime Environment (Zulu 8.48.0.53-CA-linux64) (build 1.8.0_265-b11)
OpenJDK 64-Bit Server VM (Zulu 8.48.0.53-CA-linux64) (build 25.265-b11, mixed mode)

$ bcftools -v
bcftools 1.9
Using htslib 1.9

$ bedtools -version
bedtools v2.30.0

The machine that "bcbio-prioritize" succeeded.

$ java -version
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-8u282-b08-0ubuntu1~20.04-b08)
OpenJDK 64-Bit Server VM (build 25.282-b08, mixed mode)

$ bcftools -v
bcftools 1.10.2
Using htslib 1.10.2-3

$ bedtools -version
bedtools v2.27.1

I went back to the machine that "bcbio-prioritize" failed, and installed "sudo apt install bedtools". Then, "bcbio-prioritize" worked successfully. I've installed/removed "sudo apt install/remove bcftools", but it's not the cause of error. Different versions of java can run "bcbio-prioritize".

I created a symbolic link to replace /san/bcbio/anaconda/bin/bedtools (v2.30.0) with /usr/bin/bedtools.(v2.27.1). I successfully ran the structural variant workflow (gatk-cnv, lumpy, manta) with "svprioritize".