broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 589 forks source link

I believe `sites` and `variants` should be equivalent. I see that there is a warning emitted by GATK4 at sites that are not het, but it shouldn't be an error. If you're getting an error can you please post the full message in this thread? #7747

Open Erich-Hartm opened 2 years ago

Erich-Hartm commented 2 years ago

I believe sites and variants should be equivalent. I see that there is a warning emitted by GATK4 at sites that are not het, but it shouldn't be an error. If you're getting an error can you please post the full message in this thread?

ASEReadCounter will only process het sites in both GATK3 and GATK4, but it's possible that GATK3 silently skipped non-het sites rather than emitting a warning.

Originally posted by @meganshand in https://github.com/broadinstitute/gatk/issues/7712#issuecomment-1084552468

Erich-Hartm commented 2 years ago

Thank you, sir. In fact , firstly I called SNP from both parents respectively, and then combined both parents vcf files to a integrate vcf file, I want to get maternal different homozygous SNP compare to reference genome and paternal different homozygous SNP compare to reference genome, and combined both parents SNPs to a integrate biSNP.vcf file ,and then filtered heterozygote sites. Finally I submited this integrate biSNP.vcf as a input file of argument "sites" of GATK-3.8 ASEReaderCounter, l got normal output file of ASE reads count. But if I used argument "variants" of GATK-4.0 to replace "sites" argument of GATK-3.8, other files were not changed, it reminded me that "the SNP site of inupt file is not het" , so I got null result file in the end,because l filtered integrate vcf file, all SNP sites of different parents are homozygote compare to reference genome. So l want to know what kind of vcf file can be used as input file of "variants" argument of GATK-4.0 ASEReadCounter ? both parents g.vcf files ? Thank you!

meganshand commented 2 years ago

Can you please provide a small snipit of the biSNP.vcf file? A few sites that work in GATK3 should be enough.

Erich-Hartm commented 2 years ago

This is part of my biSNP.vcf file, include the annotation and reference genome name. thanks! example.vcf.gz

meganshand commented 2 years ago

@Erich-Hartm Thanks for that example, that's helpful. I think what's happening is that in GATK3 the documentation says that the tool should only work on heterozygous sites, but this isn't actually enforced in the code. It is enforced in GATK4 so the tool skips all the sites in your VCF since they aren't heterozygous.

I could add a mode to ASEReadCounter in GATK4 that would allow for the same behavior as GATK3, but I also want to ask, do you have a VCF of the child in your trio? Those sites that are homozygous in the parent samples should be heterozygous in the child, correct? Is there a reason you can't use the the heterozygous sites from the child as the input to ASEReadCounter?

Erich-Hartm commented 2 years ago

@meganshand Thank you very much, sir, you mean that if l want to use ASEReadCounter of GATK4, I should use F1 hybrid's whole genome sequence to produce VCF file. Usually, we always resequence genome sequence of parents and F2 group when we make F2 group QTL mapping, rather than resequencing F1 hybrid genome. So I don't have F1 genome sequence, when I make F1 hybrid's ASE count, I have to make a biVCF file by parent's genome,not by F1 hybrid's genome. That is reason why I get the result that the site is not het by GATK4 ASEReadCounter, thank you! But I think this is a little flaw of GATK4, maybe I should resequence F1 hybrid's genome. Your reply is very helpful to me, thanks again!