Open VJalili opened 3 years ago
Also svtk
's pe-test
and sr-test
tools do not set PRNG seeds.
Also missing random seed in svtk adjudicate
for subsampling function https://github.com/broadinstitute/gatk-sv/blob/main/src/svtk/svtk/adjudicate/random_forest.py#L82 (and 86)
The pipeline's Wham-related irreproducible output is related to using a random number generator with a floating seed in the Wham source code. The steps we took to fix this issue are:
make
process of Wham includes building bamtools
from an older version, which fails to compile on Ubuntu 18 (and all the newer versions). gcc
) that are not included in the Ubuntu 18 docker image. I explored options for installing older compiler versions, explicitly trying gcc-4.x
and gcc-5
, but bamtools failed to compile with these versions. bamtools
from apt (apt-get install -y libbamtools-dev
) to avoid the build-related issues and removed bamtools
from Wham's make
process. Wham failed to compile with this approach.bamtools
(fetched using git clone --recursive
) with (a) current HEAD and (b) older commits (e.g., HEAD of the master branch in 2019). With this approach, bamtools
compiles successfully, but Wham's make
continues to fail.
There is a small discrepancy between multiple executions of
GATKSVPipelineSingleSample
on the same input with the same parameters. The discrepancy is observed in the following files:It seems the discrepancy originates from the
Whamg workflow
. The workflow useswhamg
(whamg source code
), which seems to have some randomness involved (https://github.com/zeeev/wham/issues/51).I ran the
GATKSVPipelineSingleSample
workflow multiple times without call-caching (i.e., setting"read_from_cache" : false
and"write_to_cache" : false
in the options files), and ransvtest vcf
on the final VCFs of the separate executions comparing all to a common baseline. If the pipeline was idempotent, I would expect the same difference/similarity to the baseline in all the executions, however, the executions slightly differ from each other compared to the baseline. Some of the output metrics ofsvtest vcf
comparing a test run with a given UUID vs. the baseline are as follows.