broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.64k stars 580 forks source link

Varying Mutect2 results based on apt and samtools over time #7269

Open migbro opened 3 years ago

migbro commented 3 years ago

Bug Report

Affected tool(s) or class(es)

Tool/class name(s), special parameters? Mutect2

Affected version(s)

Description

Describe the problem below. Provide screenshots , stacktrace , logs where appropriate. We have run mutect2 on the same sample using the same input crams, references, and intervals. The only discernible difference is the docker image used that we built. The difference between the first and second one is that the first one was built without samtools, the second one with samtools. The third is exactly the same as the second except it was re-built about a year or more later. Looking at a count of the PASS results based on each:

Run type var count
docker no samtools 8265
docker yes samtools 8283
docker yes samtools rebuilt 8273
docker no samtools recently built 8271

Out of curiosity, we tried building the docker again without samtools, so in theory, the only possible change is that when each docker is built, apt update is run. The differences are small, but is that expected? That with and without samtools, and if apt packages change, mutect2 could be influenced?

Steps to reproduce

Tell us how to reproduce this issue. If possible, include command lines that reproduce the problem. (The support team may follow up to ask you to upload data to reproduce the issue.) I can't replicate scenario 1 in the table because it was built in 2019, so apt packages were different then. Scenario 2 dockerPull: docker pull kfdrc/gatk:4.1.1.0 Scenario 3 dockerPull: docker pull pgc-images.sbgenomics.com/d3b-bixu/gatk:4.1.1.0 Scenario 4 dockerPull: docker pull migbro/gatk:4.1.1.0L

No samtools Dockerfile:

FROM ubuntu:18.04
LABEL maintainer="Miguel Brown (brownm28@email.chop.edu)"

ENV GATK4_VERSION 4.1.1.0

RUN apt update && apt install -y openjdk-8-jdk python wget unzip libgomp1; \
wget -q https://github.com/broadinstitute/gatk/releases/download/${GATK4_VERSION}/gatk-${GATK4_VERSION}.zip; \
unzip gatk-${GATK4_VERSION}.zip; \
mv gatk-${GATK4_VERSION}/gatk* . && rm -rf gatk-${GATK4_VERSION}*; \
apt remove -y  wget

Yes samtools Dockerfile:

FROM ubuntu:18.04
LABEL maintainer="Miguel Brown (brownm28@email.chop.edu)"

ENV GATK4_VERSION 4.1.1.0

RUN apt update && apt install -y openjdk-8-jdk python wget unzip libgomp1 tabix samtools; \
wget -q https://github.com/broadinstitute/gatk/releases/download/${GATK4_VERSION}/gatk-${GATK4_VERSION}.zip; \
unzip gatk-${GATK4_VERSION}.zip; \
mv gatk-${GATK4_VERSION}/gatk* . && rm -rf gatk-${GATK4_VERSION}*; \
apt remove -y wget

Expected behavior

Tell us what should happen All PASS var counts are the same

Actual behavior

Tell us what happens instead PASS var counts vary slightly +/- samtools and year docker built

Thank you for your time!

droazen commented 3 years ago

@migbro Are you just running Mutect2, or are you running a pipeline consisting of multiple tools? If the latter, which specific tools are you running?

Also, you are running a very old release of GATK. Some of these old releases are known to have some rare sources of non-determinism in Mutect2 which could cause results to vary slightly across runs. We recommend upgrading to the latest release (4.2.0.0) if you can, and using the official GATK docker images in dockerhub (https://hub.docker.com/r/broadinstitute/gatk/).

migbro commented 3 years ago

Hi @droazen , thanks for your quick response! Depending on what you mean for "multiple tools," I'd say just Mutect2. I say this because we are running this as part of a cwl workflow, so Mutect2 is running on its own instance (virtual machine) with that specified docker image. All of the surrounding tools have produced the expected outputs when queried except for Mutect2. That's a good point about the release version, I think we happen to be considering upgrading, but have to be picky as to when given the size and scope of our operation :) Also, your suggestion of using the official GATK might help, as I am not sure I can rule out subtle apt update changes even within the same version of ubuntu over time.