Open mufernando opened 3 years ago
Here's what's happening:
start, stop, step
uses range(start, stop, step)
here to produce the starting points.range(1, 4, 1)
produces [1,2,3]
, for the starting points, and then 4
is reserved for the last stop
value, which makes sense, as you did say stop=4
.@mufernando, it's not clear to me what the right thing is here. At least this should be documented more (eg. say that endpoints of windows are both inclusive), but I'd have to work some more examples to figure out the consequences of different ways of doing this.
Looking at the implementation of position_windows again, I'm not sure why I did it like that. Perhaps this would be better:
for window_start in range(start, stop, step):
# determine window stop (inclusive)
window_stop = window_start + size - 1
if window_stop > stop:
# last window
window_stop = stop
last = True
windows.append([window_start, window_stop])
if last:
break
That seems like it'd be better...
That looks good @alimanfoo. The only thing to change there would be to do the range up until stop+1
, because in general 1-based systems are left and right inclusive (see this).
In the example below, we have four sites and I ask scikit-allel to output windowed diversity with windows of length 1.
Note, however, that I only get 3 values in the diversity array, and the last window ends up including sites 3 and 4.
This seems a bit odd to me, so I was trying to understand why scikit-allel handles it like this. Just to spell it out: I find it odd that the windows end up being 1,1], [2,2] and [3,4] instead of [1,1], [2,2], [3,3], [4,4].
cc @petrelharp