Closed isthisthat closed 3 years ago
This is the latest sambamba (0.7.1) and latest samtools (1.10)
This is apparently a behaviour documented in the samtools view manual: https://www.htslib.org/doc/samtools-view.html#OPTIONS
Yeah. Sambamba mirrors samtools
-s FLOAT
Output only a proportion of the input alignments. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate.
The integer and fractional parts of the -s INT.FRAC option are used separately: the part after the decimal point sets the fraction of templates/pairs to be kept, while the integer part is used as a seed that influences which subset of reads is kept.
When subsampling data that has previously been subsampled, be sure to use a different seed
value from those used previously; otherwise more reads will be retained than expected.
This is the culmination of weeks of hunting this bug down.. In the following sequence, the second command will not sub-sample, instead it will output the input bam unchanged (except for an additional header tag):
The second command will work if the seed is changed, e.g.
The exact same bug exists in samtools as well. So not sure where this is coming from!?