Closed jrm5100 closed 4 months ago
I confirm:
positionIsOutsideFrInsert
was actually positionIsInsideFrInsert
positionIsOutsideFrInsert
was still correct in functionrec.isFrPair
was unnecessary~ EDIT: see below, this was not true.Check out #981. I can't assign you as a reviewer, but would like your 👀 on it.
The history of the filter started 5 years ago in the context of somatic variant filtration:
It was then refactored into the PileupBuilder and swapped from a negative filter (exclude
) to a positive filter (include
):
The filter still behaves correctly:
Action | Behavior |
---|---|
True | Do not apply the filter. |
False | Remove FR pairs where the position in that pair is outside the insert coordinates. Otherwise keep the record (single-end, RF pair, tandem pair). |
Because of this semantic change, the includeMapPositionsOutsideFrInsert
filter behaves correctly but its meaning might be unclear. Why would someone want to include map positions outside the FR pair? I now think the filter should behave exactly as it is now but be called:
includeMapPositionsInsideFrInsert
And its default set to true
.
What do you think of renaming includeMapPositionsOutsideFrInsert
to includeMapPositionsInsideFrInsert
and swapping the default boolean from false
to true
? It would be a breaking Scala API change, but gratefully the name change would require a developer who is using it to reconsider the filter when updating their use of the API.
@jrm5100 I don't think we want to exclude records in a pair where a mate maps to a different reference sequence. Although these pairs would not be proper, they should still count towards coverage when requesting a pile up of reads overlapping either of the reads in the pair.
But if you do not want to include pairs with reads mapped to different reference sequences, you can add a custom filter like:
PileupBuilder(source).withReadFilter(rec => rec.refName == rec.mateRefName)
I think the main sticking point here is the intention of the filter.
Anything not isFRPair
includes:
Reads that are isFrPair
may look like this:
─────────────────────────►
◄────────────────────────
│ │
│ Insert Size │
│ │
and being OutsideFrInsert
would mean in the 3' tails that likely aren't real.
So is the filter meant to:
isFrPair
but when they are isFrPair
, make sure they aren't outside the insert region.The distinction is where isFrPair
is used.
It seems like the second is the intended usage. In that case I think the current filter name and functionality is fine, but the internal function just needs to be renamed and used slightly differently:
if (compare) compare = if (!includeMapPositionsOutsideFrInsert && rec.isFrPair) {
PileupBuilder.positionIsInsideFrInsert(rec, refIndex = refIndex, pos = pos)
} else { compare }
In this case the && rec.isFrPair
is actually necessary, since positionIsInsideFrInsert
is False
if the record isn't in a FR pair, and I think that was my main source of confusion for the intention of the filter.
@jrm5100 I don't think we want to exclude records in a pair where a mate maps to a different reference sequence. Although these pairs would not be proper, they should still count towards coverage when requesting a pile up of reads overlapping either of the reads in the pair.
But if you do not want to include pairs with reads mapped to different reference sequences, you can add a custom filter like:
PileupBuilder(source).withReadFilter(rec => rec.refName == rec.mateRefName)
Yeah, after reading through the code more I realized that and settled on
PileupBuilder(source).withReadFilter(rec => rec.isFrPair)
Which is slightly more strict, but works for my purposes.
Yes, this was the original intent of the filter:
Keep those not isFrPair but when they are isFrPair, make sure they aren't outside the insert region
So I need to revert the change I just made and update the documentation to be more clear. The original rec.isFrPair
wasn't redundant! I've just gotten myself confused along the way.
positionIsOutsideFrInsert
is actuallypositionIsInsideFrInsert
https://github.com/fulcrumgenomics/fgbio/blob/3a74fd28ff630b417410953541ff107a1b7b0fb7/src/main/scala/com/fulcrumgenomics/bam/pileup/PileupBuilder.scala#L123-L128
The filter is used in a way that is reversed so that the current tests still work as designed.
https://github.com/fulcrumgenomics/fgbio/blob/3a74fd28ff630b417410953541ff107a1b7b0fb7/src/main/scala/com/fulcrumgenomics/bam/pileup/PileupBuilder.scala#L203-L205
Additionally there is a redundant
rec.isFrPair
call (this is done inside the method).I came across this because I was tracking down some aberrant alleles in a pileup. Should reads mapped to separate chromosomes be rejected by this filter, or should that be a separate filter (likely in
quickAcceptRecord
)? Currently these are accepted even ifincludeMapPositionsOutsideFrInsert
isfalse
, which i think may be misleading.