bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

bcbio v1.2.0 issues with vcfanno input files #3171

Closed Fazulur closed 4 years ago

Fazulur commented 4 years ago

Dear Bcbio team,

We installed bcbio new version v1.2.0 & tried to run gatk-variant pipeline. It is giving the below error. We tried adding tools_on: gemini to yaml file. Still It is giving errors.

[2020-04-07T11:23Z] nsnode44: Not running gemini, not configured in tools_on: brain [2020-04-07T11:23Z] nsnode44: Unexpected error Traceback (most recent call last): File "BCBIO/v1.2.0_updated/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 54, in _setup_logging yield config File "BCBIO/v1.2.0_updated/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 463, in prep_gemini_db return ipython.zip_args(apply(population.prep_gemini_db, args)) File "BCBIO/v1.2.0_updated/anaconda/lib/python3.6/site-packages/bcbio/distributed/ipythontasks.py", line 82, in apply return object(args, kwargs) File "BCBIO/v1.2.0_updated/anaconda/lib/python3.6/site-packages/bcbio/variation/population.py", line 42, in prep_gemini_db ann_vcf = run_vcfanno(gemini_vcf, data, decomposed) File "BCBIO/v1.2.0_updated/anaconda/lib/python3.6/site-packages/bcbio/variation/population.py", line 121, in run_vcfanno decomposed=decomposed) File "BCBIO/v1.2.0_updated/anaconda/lib/python3.6/site-packages/bcbio/variation/vcfanno.py", line 35, in run conffn = _combine_files(conf_fns, out_file, data, basepath is None) File "BCBIO/v1.2.0_updated/anaconda/lib/python3.6/site-packages/bcbio/variation/vcfanno.py", line 63, in _combine_files line = _fill_file_path(line, data) File "BCBIO/v1.2.0_updated/anaconda/lib/python3.6/site-packages/bcbio/variation/vcfanno.py", line 88, in _fill_file_path assert full_file, "Did not find vcfanno input file %s" % (orig_file) AssertionError: Did not find vcfanno input file variation/dbsnp-151.vcf.gz**

And our sample configuration yaml file is below

details:

Could you please let us know how to proceed further.

Thanks In Advance Fazulur Rehaman

naumenko-sa commented 4 years ago

Hi Fazulur @Fazulur !

Sorry about the issue. I think we have just recently fixed it here: https://github.com/bcbio/bcbio-nextgen/issues/3160

Please upgrade with bcbio_nextgen.py upgrade -u skip --genomes hg38 and try again.

Sergey

Fazulur commented 4 years ago

Hi Sergey,

Thanks a lot for your quick response. I upgraded bcbio using above command. But it is giving same error again.

Could you please suggest me how can I proceed further.

Thanks In Advance Fazulur Rehaman

naumenko-sa commented 4 years ago

Hi @Fazulur !

Try to check your data/genomes/Hsapiens/hg38/seq/hg38-resources.yaml If the update was successful, it should have a line: dbsnp: ../variation/dbsnp-153.vcf.gz Also check whether dbsnp file is installed in: genomes/Hsapiens/hg38/variation/dbsnp-153.vcf.gz

Update bcbio to 1.2.3 just in case: bcbio_nextgen.py upgrade -u stable --tools

Sergey

erinijapranckeviciene commented 4 years ago

Hi @Fazulur @naumenko-sa

I had the similar problem. I found that in my v.1.2.0 --genomes hg38 install the genomes/config/vcfanno/gemini.conf has dbsnp-151.vcf.gz but in the genomes/variation folder the dbsnp is 153: genomes/variation/dbsnp-153.vcf.gz .

I have modified gemini.conf changing the dbsnp-151 into dbsnp-153 and the problem seems to go away.

Fazulur commented 4 years ago

Hi @erinijapranckeviciene

Thanks a lot. It worked.

Thanks & Regards Fazulur Rehaman

Fazulur commented 4 years ago

Hi @Sergey,

I have modified gemini.conf as per @erinijapranckeviciene. I tested whole exome & RNA bcbio pipelines and they are working fine.

When I tried testing one whole genome 30X sample with gatk-variant pipeline and it is giving the below error at haplotypecaller step

java.lang.ArrayIndexOutOfBoundsException: 3 at org.broadinstitute.hellbender.utils.GenotypeUtils.computeDiploidGenotypeCounts(GenotypeUtils.java:70) at org.broadinstitute.hellbender.tools.walkers.annotator.ExcessHet.calculateEH(ExcessHet.java:86) at org.broadinstitute.hellbender.tools.walkers.annotator.ExcessHet.annotate(ExcessHet.java:74) at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:293) at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.makeAnnotatedCall(HaplotypeCallerGenotypingEngine.java:365) at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerGenotypingEngine.assignGenotypeLikelihoods(HaplotypeCallerGenotypingEngine.java:189) at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(HaplotypeCallerEngine.java:608) at org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller.apply(HaplotypeCaller.java:212) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(AssemblyRegionWalker.java:200) at org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse(AssemblyRegionWalker.java:173) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206) at org.broadinstitute.hellbender.Main.main(Main.java:292) Using GATK jar BCBIO/v1.2.0_updated/anaconda/share/gatk4-4.1.6.0-0/gatk-package-4.1.6.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms4g -Xmx13g -XX:+UseSerialGC -Djava.io.tmpdir=/gpfs/ngsdata/scratch/fazulur/testcases/bcbio/test-bcbio-1.2.0-wgs/scratch/bcbiotx/tmpmalwhvqr -jar BCBIO/v1.2.0_updated/anaconda/share/gatk4-4.1.6.0-0/gatk-package-4.1.6.0-local.jar HaplotypeCaller -R BCBIO/v1.2.0_updated/genomes/Hsapiens/hg38/seq/hg38.fa --annotation MappingQualityRankSumTest --annotation MappingQualityZero --annotation QualByDepth --annotation ReadPosRankSumTest --annotation RMSMappingQuality --annotation BaseQualityRankSumTest --annotation FisherStrand --annotation MappingQuality --annotation DepthPerAlleleBySample --annotation Coverage -Iest-bcbio-1.2.0-wgs/scratch/align/0200233341_S5_L008/0200233341_S5_L008-sort-recal.bam -Lest-bcbio-1.2.0-wgs/scratch/gatk-haplotype/chrY/0200233341_S5_L008-joint-chrY_0_16127313-regions.bed --interval-set-rule INTERSECTION --annotation ClippingRankSumTest --annotation DepthPerSampleHC --native-pair-hmm-threads 1 --emit-ref-confidence GVCF -GQB 10 -GQB 20 -GQB 30 -GQB 40 -GQB 60 -GQB 80 -ploidy 1 --outputest-bcbio-1.2.0-wgs/scratch/bcbiotx/tmpmalwhvqr/0200233341_S5_L008-joint-chrY_0_16127313.vcf.gz ' returned non-zero exit status 3.

Could you please help me to resolve this error.

Thanks In Advance Fazulur Rehaman

naumenko-sa commented 4 years ago

Hi @Fazulur !

It looks like an exception from GATK HaplotypeCaller triggered by 0200233341_S5_L008/0200233341_S5_L008-sort-recal.bam and 0200233341_S5_L008-joint-chrY_0_16127313-regions.bed input.

Can you run this last command outside of bcbio, i.e. to make sure whether it is bcbio error or gatk error?

Sergey

Fazulur commented 4 years ago

Hi @Sergey,

You are right. This error is with new version of GATK 4.1.6.0.

Do we need to downgrade GATK version & proceed. Please suggest me how can we proceed further?

Thanks In Advance Fazulur Rehaman

naumenko-sa commented 4 years ago

Hi @Fazulur !

If you can reproduce the issue outside of bcbio, then could you please raise it with GATK team and provide them the two files to reproduce? https://github.com/broadinstitute/gatk/issues

It is possible to downgrade gatk with conda install -c bioconda --force-reinstall gatk4=[version], but we have aligned bcbio wrapper to support the latest gatk already and it is better to solve the issue for everyone, since you were first to identify it.

Sergey

naumenko-sa commented 4 years ago

Thanks @Fazulur ! I see you have raised the issue: https://github.com/broadinstitute/gatk/issues/6552. linking it here for tracking. SN

naumenko-sa commented 4 years ago

upd: modified gemini.conf here: https://github.com/bcbio/bcbio-nextgen/blob/master/config/vcfanno/hg38-gemini.conf

naumenko-sa commented 4 years ago

upd: fixed in gatk repo, waiting for new release

Fazulur commented 4 years ago

Dear @Sergey,

Thanks a lot for update.

Thanks & Regards Fazulur Rehaman

Fazulur commented 4 years ago

Dear @Sergey,

GATK released 4.1.7.0 with fix to this issue https://github.com/broadinstitute/gatk/releases

Could you please let us know when bcbio will be ready with GATK 4.1.7.0.

Thanks In Advance Fazulur Rehaman

naumenko-sa commented 4 years ago

Thanks @Fazulur !

We are not pinning gatk4.

Just update with bcbio_nextgen.py -u skip --tools and run again. It seems to work for me.

Let us know if you see any other bcbio issues!

Sergey

Fazulur commented 4 years ago

Dear @Sergey,

Thanks a lot. It is working now without any issues.

Thanks & Regards Fazulur Rehaman