fade can be cumbersome and unintuitive when it comes to sorting.
Ideally, the input bam to fade out is queryname sorted. This allows fade out to eject an artifact mate pair there is evidence of a read or its mate being an enzymatic fragmentation artifact. The logic behind this is that we cannot trust the mate of a read that contains an artifact because it is expected the whole insert is artifactual.
If the -c flag is provided for only clipping artifact reads, nothing will be done to the mate of an artifact read.
If the bam is not queryname sorted, fade out will only eject the reads individually and not consider them as paired.
Additionally, fade annotate works in parallel, even if the input bam is coordinate or queryname sorted, the output bam will NOT be sorted.
fade out may also NOT result in a coordinate-sorted bam, even if samtools sort is run after fade annotate. fade out modifies the alignment position of artifactual reads if the -c flag is provided. Hard-clipping the artifact regions of the reads changes the alignment position. In the end, we may be looking at fade's overall execution for a pipeline as:
It may be a significant bit of work, but we should potentially support resorting input and output bam files internally.
# output either is coordinate sorted by fade
# or retains original sorting
fade annotate -b input.bam ref.fa > input.anno.bam
# fade internally resorts input based on queryname
# if -c flag is not used
# output is always resorted to coordinate sorting
fade out -b input.qns.bam > input.out.bam
Alternatively, we could warn the user when their bam file appears unsorted and warn that fade's output is not sorted.
fade
can be cumbersome and unintuitive when it comes to sorting.Ideally, the input bam to
fade out
is queryname sorted. This allowsfade out
to eject an artifact mate pair there is evidence of a read or its mate being an enzymatic fragmentation artifact. The logic behind this is that we cannot trust the mate of a read that contains an artifact because it is expected the whole insert is artifactual.If the
-c
flag is provided for only clipping artifact reads, nothing will be done to the mate of an artifact read.If the bam is not queryname sorted,
fade out
will only eject the reads individually and not consider them as paired.Additionally,
fade annotate
works in parallel, even if the input bam is coordinate or queryname sorted, the output bam will NOT be sorted.fade out
may also NOT result in a coordinate-sorted bam, even ifsamtools sort
is run afterfade annotate
.fade out
modifies the alignment position of artifactual reads if the-c
flag is provided. Hard-clipping the artifact regions of the reads changes the alignment position. In the end, we may be looking at fade's overall execution for a pipeline as:It may be a significant bit of work, but we should potentially support resorting input and output bam files internally.
Alternatively, we could warn the user when their bam file appears unsorted and warn that fade's output is not sorted.
@jblachly thoughts?