broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 589 forks source link

Why we use MIN_DP for DP and then AD of the ref allele at Hom-Ref block derived genotypes. #7185

Open vruano opened 3 years ago

vruano commented 3 years ago

Question

Why we use MIN_DP over DP for synthetic Ref allele depth for genotypes derived from hom-ref blocks? Would it make more sense to keep and use the average or median?


## GenotypeGVCFsEngine.java:176 (about)
...
if (result.isPolymorphicInSamples()) {
            // For polymorphic sites we need to make sure e.g. the SB tag is sent to the annotation engine and then removed later.
            final VariantContext reannotated = annotationEngine.annotateContext(result, features, ref, null, a -> true);
              return new VariantContextBuilder(reannotated).genotypes(
==!==>                   cleanupGenotypeAnnotations(reannotated, false)).make();
        } else if (includeNonVariants) {
...

## Same file ln 436, method cleanupGenotypeAnnotations:
...
// move the MIN_DP to DP
            if ( oldGT.hasExtendedAttribute(GATKVCFConstants.MIN_DP_FORMAT_KEY) ) {
                depth = parseInt(oldGT.getAnyAttribute(GATKVCFConstants.MIN_DP_FORMAT_KEY));
                builder.DP(depth);
                attrs.remove(GATKVCFConstants.MIN_DP_FORMAT_KEY);
            }
...

Tool(s) or class(es) involved

GenotypeGVCFs

ldgauthier commented 3 years ago

A user brought this up recently, and I think it's a good point. I wouldn't want to change the default behavior, but I would be open to adding a new argument to GenotypeGVCFs to use median instead of min. What do you think @droazen ? Or maybe this is a question for @vdauwera and @eitanbanks as GATK product owners.

droazen commented 3 years ago

@ldgauthier defer to you on this, but agree that it seems confusing/misleading, especially in the case of large ref blocks with highly variable depth

vdauwera commented 3 years ago

Agree with adding an option to use median -- we can change the default behavior in GATK 5 ;)

ldgauthier commented 3 years ago

@vruano can you take on adding the arg and alternate behavior? Should be pretty quick.

vruano commented 3 years ago

Ok.