broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

java.lang.IllegalArgumentException: the number of genotypes is too large for ploidy 8 and 55 alleles: approx. 3381098545 #8842

Open ChenDepp opened 4 months ago

ChenDepp commented 4 months ago

bug reports

hi guys :

Hi all, when I run gatk (version: 4.5.0.0) CombineGVCFs to combine 240 8 ploidy samples gvcf, it reports the error as below image

how call i solve it? ,replace CombineGVCFs with GenomicsDBimport ? I think even though I got the merged gvcf file , this error is also will be reported when I run GenotypeGVCF? I look forward to your suggestions have a good day!

gokalpcelik commented 4 months ago

GenomicsDBImport is definitely the way to go for this kind of operation. On the other hand STRs are quite prone to errors especially when higher ploidies are involved. You may wish to reduce them or even completely drop them if they are not of your interest.

ChenDepp commented 4 months ago

hi @gokalpcelik I used GenomicsDBImport to replace CombinedGVCFs, but it has new problems, GenotypeGVCFs for GenomicsDB is so slow, can only get 900K interval vcf in 9 hours. how can i speed it up. waiting for your reply. hava a good day!

gokalpcelik commented 4 months ago

Hi again. You should be able to split your variants into multiple intervals and import all intervals in parallel under different genomicsDB import instances. Those instances can then be genotyped in parallel and finally combined into a single callset. By this way you can get your variants faster. This method is called scatter-gather which is what we do and suggest.

I hope this helps. Regards.