Where to run in GATK workflow

sowalsky commented 3 years ago

The documentation does not indicate when FADE should be run when following alignment workflows, such as GATK? Immediately after alignment and qname sort, or after recalibration/mark duplicates? Also, how does FADE work when comparing GATK 3 (with local indel realignment) to GATK 4 (no realignment)?

charlesgregory commented 3 years ago

In our own workflows, we usually use fade as a last step of bam processing before variant calling. For us, this is after duplicate removal, indel realignment, and base quality score recalibration. As far as GATK3 to GATK4 differences, I have not tested fade for use cases where indel realignment is not performed. However in figure 3 of our publication, we show that fade seems to have a negligible effect on the ability to call INDEL variants. I suspect due to the requirement for the algorithm to find a nearby match to the opposite strand. I think it is safe to assume that whether or not INDEL realignment is done, fade should still be able to identify artifacts from real INDELs.

jblachly commented 3 years ago

@sowalsky Closing the issue, but please by all means reach out if your \issue is not resolved or something else comes up. Kind regards.

blachlylab / fade

Where to run in GATK workflow #15