bcbio / bcbio.variation.recall

Parallel merging, squaring off and ensemble calling for genomic variants
20 stars 3 forks source link

ensemble error #17

Open bioinfo-dirty-jobs opened 7 years ago

bioinfo-dirty-jobs commented 7 years ago

I run on RHEL 5.5 and I have this error:

I set again the path now I have the right bcftools.. I have this error now

 ~/jdk1.8.0_121/bin/java -XX:+UseSerialGC -Xms1g -Xmx10g -jar /illumina/software/PROG2/bcbio-variation-recall-0.1.7  ensemble numpass= 1 --names mutect1,varscan2,vardict,mutect2   output.vcf /illumina/software/database/database_2016/hg19_primary.fa 411.mutect1.pass.vcf.gz,411.varscan.pass.vcf.gz,411.vardict.pass.vcf.gz,411.mutect2.pass.vcf.gz 

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
    at java.lang.StringCoding.encode(StringCoding.java:344)
    at java.lang.String.getBytes(String.java:918)
    at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
    at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
    at java.io.File.exists(File.java:819)
    at me.raynes.fs$exists_QMARK_.invoke(fs.clj:103)
    at bcbio.run.clhelp$exists_or_gz_QMARK_.invoke(clhelp.clj:28)
    at bcbio.run.clhelp$get_vcf_bam_flex.invoke(clhelp.clj:34)
    at clojure.core$map$fn__4553.invoke(core.clj:2624)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.Cons.next(Cons.java:39)
    at clojure.lang.RT.next(RT.java:674)
    at clojure.core$next__4112.invoke(core.clj:64)
    at clojure.core$concat$cat__4217$fn__4218.invoke(core.clj:707)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.ChunkedCons.chunkedNext(ChunkedCons.java:59)
    at clojure.lang.ChunkedCons.next(ChunkedCons.java:43)
    at clojure.lang.PersistentVector.create(PersistentVector.java:74)
    at clojure.lang.LazilyPersistentVector.create(LazilyPersistentVector.java:30)
    at clojure.core$vec.invoke(core.clj:361)
    at bcbio.run.clhelp$get_vcf_bam_flex.invoke(clhelp.clj:39)
    at clojure.core$map$fn__4553.invoke(core.clj:2622)
    at clojure.lang.LazySeq.sval(LazySeq.java:40)
    at clojure.lang.LazySeq.seq(LazySeq.java:49)
    at clojure.lang.RT.seq(RT.java:507)
    at clojure.core$seq__4128.invoke(core.clj:137)
    at clojure.core$apply.invoke(core.clj:630)
    at clojure.core$mapcat.doInvoke(core.clj:2660)
    at clojure.lang.RestFn.invoke(RestFn.java:423)
chapmanb commented 7 years ago

Thanks for the report. This is a Java memory error, it looks like you specified -Xmx10g but it might need more memory to handle your inputs. Hopefully running on a machine with more memory and increasing this will help avoid the problem.

bioinfo-dirty-jobs commented 7 years ago

I have try to reduce the file and I increase the memory. However I have this problems:

`~/jdk1.8.0_121/bin/java -XX:+UseSerialGC -Xms20g -Xmx45g -jar /illumina/software/PROG2/bcbio-variation-recall-0.1.7  ensemble numpass= 1 --names mutect2,varscan2  output.vcf /illumina/software/database/database_2016/hg19_primary.fa 411.mutect2.pass.vcf.gz,411.varscan.pass.vcf.gz

Exception in thread "main" java.lang.OutOfMemoryError
    at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161)
    at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at clojure.string$join.invoke(string.clj:189)
    at bcbio.run.clhelp$error_msg.invoke(clhelp.clj:56)
    at bcbio.variation.ensemble.intersect$_main.doInvoke(intersect.clj:136)
    at clojure.lang.RestFn.applyTo(RestFn.java:137)
    at clojure.core$apply.invoke(core.clj:630)
    at bcbio.variation.recall.main$_main$fn__2484.invoke(main.clj:34)
    at bcbio.variation.recall.main$_main.doInvoke(main.clj:33)
    at clojure.lang.RestFn.applyTo(RestFn.java:137)
    at bcbio.variation.recall.main.main(Unknown Source`
chapmanb commented 7 years ago

Thanks for the follow up. That's a pretty similar error about memory issues so is likely the same underlying cause. Does the machine have 45Gb of memory to allocate? If it's not that high memory of a machine it might be running out of memory to allocate to Java, leading to this error. Hope this helps.

bioinfo-dirty-jobs commented 7 years ago

I have 47G of ram

[sbsuser@compute-00-01 411]$ free -g
             total       used       free     shared    buffers     cached
Mem:            47         19         27          0          0         18
-/+ buffers/cache:          0         46
Swap:            1          0          1
[sbsuser@compute-00-01 411]$ ~/jdk1.8.0_121/bin/java -XX:+UseSerialGC -Xms27g -Xmx27g -jar /illumina/software/PROG2/bcbio-variation-recall-0.1.7  ensemble numpass= 1 --names mutect2,varscan2  output.vcf /illumina/software/database/database_2016/hg19_primary.fa 411.mutect2.pass.vcf.gz,411.varscan.pass.vcf.gz
chapmanb commented 7 years ago

Thanks for the comment. I'm not sure how to follow up, did specifying your available memory, 27Gb, work to resolve the issue? As you noted, if you're using the system memory for other processes it won't be available to java so you could run into memory problems. Hope this resolves it for you.

bioinfo-dirty-jobs commented 7 years ago

Maybe the comand that I use it is wrong.. could you please check? I try to create small sample dataset with only few variants

~/jdk1.8.0_121/bin/java -XX:+UseSerialGC -Xms1g -Xmx10g -jar /illumina/software/PROG2/bcbio-variation-recall-0.1.7  ensemble --help
Ensemble calling for samples: combine multiple VCF caller outputs into a single callset.

Usage: bcbio-variation-recall ensemble [options] out-file ref-file [<vcf-files or list-files>]

   out-file:   bgzipped VCF file to write merged output to
   ref-file:   FASTA format genome reference file
  <remaining>: VCF files to include for building a final ensemble callset.
               Specify on the command line or as text files containing paths to files.
               VCFs can be single or multi-sample.
               The input order of VCFs determines extraction preference in the final ensemble output.

Options:
  -c, --cores CORES      1  Number of cores to use
  -n, --numpass NUMPASS  2  Number of callers a variant should be present in to pass
      --names NAMES         Comma separated list of names corresponding to VCFs for annotating output
      --nofiltered          Remove filtered variants before performing ensemble calls

The comand are not the same I use up to know. I try to use this and I have this error:

~/jdk1.8.0_121/bin/java -XX:+UseSerialGC -Xms1g -Xmx10g -jar /illumina/software/PROG2/bcbio-variation-recall-0.1.7  ensemble -n 1   output.vcf.gz /illumina/software/database/database_2016/hg19_primary.fa  411_mutect2.somatic.vcf,411.merge.varscan2.somatic.vcf 
The following errors occurred while parsing your command:
Input files not found:
411_mutect2.somatic.vcf,411.merge.varscan2.somatic.vcf
chapmanb commented 7 years ago

The files should be specified with spaces, not commas, so you want:

~/jdk1.8.0_121/bin/java -XX:+UseSerialGC -Xms1g -Xmx10g -jar /illumina/software/PROG2/bcbio-variation-recall-0.1.7  ensemble -n 1   output.vcf.gz /illumina/software/database/database_2016/hg19_primary.fa  411_mutect2.somatic.vcf 411.merge.varscan2.somatic.vcf 

It doesn't split commas so treated that as one file it didn't find. Hope this helps.

bioinfo-dirty-jobs commented 7 years ago

Thanks so much!! Now works for mutect2 and varscan but if I use also vardcit output I have this error:

~/jdk1.8.0_121/bin/java -XX:+UseSerialGC -Xms1g -Xmx10g -jar /illumina/software/PROG2/bcbio-variation-recall-0.1.7  ensemble -n 1   ensemble.vcf.gz /illumina/software/database/database_2016/hg19_primary.fa  411_mutect2.somatic.vcf 411.merge.varscan2.somatic.vcf 411_tumor_412_normal_merge.somatic.vcf

2017-05-22 17:56:47 compute-00-01.ilmn INFO [bcbio.run.itx] - [E::hts_idx_push] chromosome blocks not continuous
2017-05-22 17:56:47 compute-00-01.ilmn INFO [bcbio.run.itx] - tbx_index_build failed: Is the file bgzip-compressed? Was wrong -p [type] option used?
2017-05-22 17:56:47 compute-00-01.ilmn ERROR [bcbio.run.itx] - 
java.lang.Exception: Shell command failed: bcftools tabix -p vcf /illumina/runs/FASTQ/Analisi_15novembre2016/Prova3/JOIN_ALL_MAY2017/411/SOMATIC/txtmp8724917206979063881/411_tumor_412_normal_merge.somatic.vcf.gz
                     tbx_index_build failed: Is the file bgzip-compressed? Was wrong -p [type] option used?
                     [E::hts_idx_push] chromosome blocks not continuous
                            bcbio.run.itx/check-run        itx.clj:  168
                            bcbio.run.itx/check-run        itx.clj:  172
   bcbio.variation.ensemble.prep/tabix-index-vcf/fn       prep.clj:   33
      bcbio.variation.ensemble.prep/tabix-index-vcf       prep.clj:   28
      bcbio.variation.ensemble.prep/bgzip-index-vcf       prep.clj:   51
                         clojure.lang.RestFn.invoke    RestFn.java:  410
                                clojure.core/map/fn       core.clj: 2622
                          clojure.lang.LazySeq.sval   LazySeq.java:   40
                           clojure.lang.LazySeq.seq   LazySeq.java:   49
                                clojure.lang.RT.seq        RT.java:  507
                                   clojure.core/seq       core.clj:  137
                                clojure.core/map/fn       core.clj: 2616
                          clojure.lang.LazySeq.sval   LazySeq.java:   40
                           clojure.lang.LazySeq.seq   LazySeq.java:   49
                         clojure.lang.LazySeq.first   LazySeq.java:   71
                              clojure.lang.RT.first        RT.java:  653
                                 clojure.core/first       core.clj:   55
bcbio.variation.ensemble.vcfsample/consistent-order  vcfsample.clj:   60
   bcbio.variation.ensemble.intersect/ensemble-vcfs  intersect.clj:   82
           bcbio.variation.ensemble.intersect/-main  intersect.clj:  140
                        clojure.lang.RestFn.applyTo    RestFn.java:  137
                                 clojure.core/apply       core.clj:  630
               bcbio.variation.recall.main/-main/fn       main.clj:   34
                  bcbio.variation.recall.main/-main       main.clj:   33
                        clojure.lang.RestFn.applyTo    RestFn.java:  137
                   bcbio.variation.recall.main.main               :     

2017-05-22 17:56:47 compute-00-01.ilmn ERROR [bcbio.variation.recall.main] - 
java.lang.Exception: Shell command failed: bcftools tabix -p vcf /illumina/runs/FASTQ/Analisi_15novembre2016/Prova3/JOIN_ALL_MAY2017/411/SOMATIC/txtmp8724917206979063881/411_tumor_412_normal_merge.somatic.vcf.gz
                     tbx_index_build failed: Is the file bgzip-compressed? Was wrong -p [type] option used?
                     [E::hts_idx_push] chromosome blocks not continuous
                            bcbio.run.itx/check-run        itx.clj:  168
                            bcbio.run.itx/check-run        itx.clj:  172
   bcbio.variation.ensemble.prep/tabix-index-vcf/fn       prep.clj:   33
      bcbio.variation.ensemble.prep/tabix-index-vcf       prep.clj:   28
      bcbio.variation.ensemble.prep/bgzip-index-vcf       prep.clj:   51
                         clojure.lang.RestFn.invoke    RestFn.java:  410
                                clojure.core/map/fn       core.clj: 2622
                          clojure.lang.LazySeq.sval   LazySeq.java:   40
                           clojure.lang.LazySeq.seq   LazySeq.java:   49
                                clojure.lang.RT.seq        RT.java:  507
                                   clojure.core/seq       core.clj:  137
                                clojure.core/map/fn       core.clj: 2616
                          clojure.lang.LazySeq.sval   LazySeq.java:   40
                           clojure.lang.LazySeq.seq   LazySeq.java:   49
                         clojure.lang.LazySeq.first   LazySeq.java:   71
                              clojure.lang.RT.first        RT.java:  653
                                 clojure.core/first       core.clj:   55
bcbio.variation.ensemble.vcfsample/consistent-order  vcfsample.clj:   60
   bcbio.variation.ensemble.intersect/ensemble-vcfs  intersect.clj:   82
           bcbio.variation.ensemble.intersect/-main  intersect.clj:  140
                        clojure.lang.RestFn.applyTo    RestFn.java:  137
                                 clojure.core/apply       core.clj:  630
               bcbio.variation.recall.main/-main/fn       main.clj:   34
                  bcbio.variation.recall.main/-main       main.clj:   33
                        clojure.lang.RestFn.applyTo    RestFn.java:  137
                   bcbio.variation.recall.main.main               :     

[sbsuser@compute-00-01 SOMATIC]$ clear

[sbsuser@compute-00-01 SOMATIC]$ ~/jdk1.8.0_121/bin/java -XX:+UseSerialGC -Xms1g -Xmx10g -jar /illumina/software/PROG2/bcbio-variation-recall-0.1.7  ensemble -n 1   ensemble.vcf.gz /illumina/software/database/database_2016/hg19_primary.fa  411_mutect2.somatic.vcf 411.merge.varscan2.somatic.vcf 411_tumor_412_normal_merge.somatic.vcf
2017-05-22 17:58:43 compute-00-01.ilmn INFO [bcbio.run.itx] - [E::hts_idx_push] chromosome blocks not continuous
2017-05-22 17:58:43 compute-00-01.ilmn INFO [bcbio.run.itx] - tbx_index_build failed: Is the file bgzip-compressed? Was wrong -p [type] option used?
2017-05-22 17:58:43 compute-00-01.ilmn ERROR [bcbio.run.itx] - 
java.lang.Exception: Shell command failed: bcftools tabix -p vcf /illumina/runs/FASTQ/Analisi_15novembre2016/Prova3/JOIN_ALL_MAY2017/411/SOMATIC/txtmp4487097885323418261/411_tumor_412_normal_merge.somatic.vcf.gz
                     tbx_index_build failed: Is the file bgzip-compressed? Was wrong -p [type] option used?
                     [E::hts_idx_push] chromosome blocks not continuous
                            bcbio.run.itx/check-run        itx.clj:  168
                            bcbio.run.itx/check-run        itx.clj:  172
   bcbio.variation.ensemble.prep/tabix-index-vcf/fn       prep.clj:   33
      bcbio.variation.ensemble.prep/tabix-index-vcf       prep.clj:   28
      bcbio.variation.ensemble.prep/bgzip-index-vcf       prep.clj:   51
                         clojure.lang.RestFn.invoke    RestFn.java:  410
                                clojure.core/map/fn       core.clj: 2622
                          clojure.lang.LazySeq.sval   LazySeq.java:   40
                           clojure.lang.LazySeq.seq   LazySeq.java:   49
                                clojure.lang.RT.seq        RT.java:  507
                                   clojure.core/seq       core.clj:  137
                                clojure.core/map/fn       core.clj: 2616
                          clojure.lang.LazySeq.sval   LazySeq.java:   40
                           clojure.lang.LazySeq.seq   LazySeq.java:   49
                         clojure.lang.LazySeq.first   LazySeq.java:   71
                              clojure.lang.RT.first        RT.java:  653
                                 clojure.core/first       core.clj:   55
bcbio.variation.ensemble.vcfsample/consistent-order  vcfsample.clj:   60
   bcbio.variation.ensemble.intersect/ensemble-vcfs  intersect.clj:   82
           bcbio.variation.ensemble.intersect/-main  intersect.clj:  140
                        clojure.lang.RestFn.applyTo    RestFn.java:  137
                                 clojure.core/apply       core.clj:  630
               bcbio.variation.recall.main/-main/fn       main.clj:   34
                  bcbio.variation.recall.main/-main       main.clj:   33
                        clojure.lang.RestFn.applyTo    RestFn.java:  137
                   bcbio.variation.recall.main.main               :     

2017-05-22 17:58:43 compute-00-01.ilmn ERROR [bcbio.variation.recall.main] - 
java.lang.Exception: Shell command failed: bcftools tabix -p vcf /illumina/runs/FASTQ/Analisi_15novembre2016/Prova3/JOIN_ALL_MAY2017/411/SOMATIC/txtmp4487097885323418261/411_tumor_412_normal_merge.somatic.vcf.gz
                     tbx_index_build failed: Is the file bgzip-compressed? Was wrong -p [type] option used?
                     [E::hts_idx_push] chromosome blocks not continuous
                            bcbio.run.itx/check-run        itx.clj:  168
                            bcbio.run.itx/check-run        itx.clj:  172
   bcbio.variation.ensemble.prep/tabix-index-vcf/fn       prep.clj:   33
      bcbio.variation.ensemble.prep/tabix-index-vcf       prep.clj:   28
      bcbio.variation.ensemble.prep/bgzip-index-vcf       prep.clj:   51
                         clojure.lang.RestFn.invoke    RestFn.java:  410
                                clojure.core/map/fn       core.clj: 2622
                          clojure.lang.LazySeq.sval   LazySeq.java:   40
                           clojure.lang.LazySeq.seq   LazySeq.java:   49
                                clojure.lang.RT.seq        RT.java:  507
                                   clojure.core/seq       core.clj:  137
                                clojure.core/map/fn       core.clj: 2616
                          clojure.lang.LazySeq.sval   LazySeq.java:   40
                           clojure.lang.LazySeq.seq   LazySeq.java:   49
                         clojure.lang.LazySeq.first   LazySeq.java:   71
                              clojure.lang.RT.first        RT.java:  653
                                 clojure.core/first       core.clj:   55
bcbio.variation.ensemble.vcfsample/consistent-order  vcfsample.clj:   60
   bcbio.variation.ensemble.intersect/ensemble-vcfs  intersect.clj:   82
           bcbio.variation.ensemble.intersect/-main  intersect.clj:  140
                        clojure.lang.RestFn.applyTo    RestFn.java:  137
                                 clojure.core/apply       core.clj:  630
               bcbio.variation.recall.main/-main/fn       main.clj:   34
                  bcbio.variation.recall.main/-main       main.clj:   33
                        clojure.lang.RestFn.applyTo    RestFn.java:  137
                   bcbio.variation.recall.main.main               :     
chapmanb commented 7 years ago

This error typically indicates the input VCF is not ordered correctly and there are some out of order blocks. This code doesn't handle any of this sorting, since that's done in the full bcbio pipeline if you need this kind of prep. Practically, I'd suggest using vt sort -m full to correctly sort the input VCF, and hopefully that will fix the problem. Hope this helps.

kkleinoros commented 7 years ago

I am getting the following error when trying to run ensemble, any insight?

2017-Jun-27 15:54:34 -0400 d1p-hydrars04.ldi.lan ERROR [bcbio.variation.recall.main] - java.lang.NullPointerException: clojure.lang.Reflector.invokeInstanceMethod Reflector.java: 26 bcbio.run.clhelp/is-vcf? clhelp.clj: 11 bcbio.run.clhelp/get-ftype clhelp.clj: 23 bcbio.run.clhelp/get-vcf-bam-flex clhelp.clj: 36 clojure.core/map/fn core.clj: 2622 clojure.lang.LazySeq.sval LazySeq.java: 40 clojure.lang.LazySeq.seq LazySeq.java: 49 clojure.lang.RT.seq RT.java: 507 clojure.core/seq core.clj: 137 clojure.core/apply core.clj: 630 clojure.core/mapcat core.clj: 2660 clojure.lang.RestFn.invoke RestFn.java: 423 bcbio.run.clhelp/vcf-bam-args clhelp.clj: 52 bcbio.variation.ensemble.intersect/-main intersect.clj: 133 clojure.lang.RestFn.applyTo RestFn.java: 137 clojure.core/apply core.clj: 630 bcbio.variation.recall.main/-main/fn main.clj: 34 bcbio.variation.recall.main/-main main.clj: 33 clojure.lang.RestFn.applyTo RestFn.java: 137 bcbio.variation.recall.main.main :

chapmanb commented 7 years ago

Kathleen; Thanks for the report and sorry about the problem. Looking at the code I believe this could happen if one of the input files to ensemble calling is completely empty. Is that possible? If so, excluding that one (or fixing the file itself) should hopefully get things working cleanly.

kkleinoros commented 7 years ago

Thank you for your reply, I will check my files. Kathleen

Sent from my iPad

On Jun 28, 2017, at 8:00 AM, Brad Chapman notifications@github.com<mailto:notifications@github.com> wrote:

Kathleen; Thanks for the report and sorry about the problem. Looking at the code I believe this could happen if one of the input files to ensemble calling is completely empty. Is that possible? If so, excluding that one (or fixing the file itself) should hopefully get things working cleanly.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/chapmanb/bcbio.variation.recall/issues/17#issuecomment-311639024, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHmO-X_JyyUJgTTvoNKGOwC1xMYjbiIiks5sIkBDgaJpZM4NY8b5.