Closed liz-is closed 8 years ago
For the midpoint distances, the differences in behaviour should just be due to rounding, where using distance
in the old calculation will round the midpoint and lead to an extra base depending on whether the rounding was up or down. Computing the distances with pairdist
preserves fractional midpoints in the intermediate calculations to get a more accurate midpoint-to-midpoint distance. (Incidentally, if the distance itself is fractional, it gets rounded down by pairdist
, so the distance is always integer.)
Personally, I don't think you need to worry about changes in the distance calculation. In real analyses, distances should be on the order of kilobases at least, and megabases more typically; an extra couple of bases shouldn't really matter to the results. The exception is with negative values, which are technically correct but might need to be treated carefully, e.g., if you're log-transforming the distances.
Okay, I've made the behaviour consistent with InteractionSet
in e02ba83. I agree that a 1bp difference won't matter at all for real data. I'll put a note about it in NEWS or something.
the InteractionSet method gives slightly different results...
Can @LTLA explain how InteractionSet calculates distances?
Also for other methods, e.g.
Here I think the old GenomicInteractions method is incorrect. But 'span' does correspond to the punion of the anchors (works in this case as they overlap), and so I think it's correct.
For 'inner' / 'gap' methods, InteractionSet returns negative values for overlapping anchors while GenomicInteractions used to set negative distances to zero. I'm now in favour of keeping negative distances as I think it's pretty easy for the user to set them to zero if they want.
In general, how should we deal with changing how distances are calculated between package versions?