Closed ericphanson closed 1 year ago
The docs are pretty clear on this at least for the start index:
integer index of the most recent sample taken at
sample_time
So at time 1.5s, the most recent sample taken at that time is index 2.
At tiem 2.5s the most recent sample is index 3. I guess then the right-closed definition gives you 2:2
, maybe that's where the issue is?
I think the solution might be to convert the final index back to time, and see if it needs to be adjusted down by 1 (e.g., if time_from_index(rate, index_from_time(rate, stop(span))) == stop(span)
then you use j-1
, otherwise use j
Related to https://github.com/beacon-biosignals/TimeSpans.jl/issues/20.
@ericphanson and I discussed this irl, and determined that, similar to #20, this is an example of pre-specified rounding behavior (in https://github.com/beacon-biosignals/TimeSpans.jl/blob/eb8b7949b3bc4b82c7f176f3b5227a48c08f4dce/src/TimeSpans.jl#L215, https://github.com/beacon-biosignals/TimeSpans.jl/blob/eb8b7949b3bc4b82c7f176f3b5227a48c08f4dce/src/TimeSpans.jl#L237) of times that do not line up exactly with index times.
We propose that we disallow calling index_from_time
on any ::Period
that does not fall exactly on a sample. We would then provide an additional set of round_to_sample_time(sample_rate, ::Union{Period, TimeSpan}, mode)
helper functions that, given a time (or span) and rounding mode, would return time(s) that fall on exact samples. Possible desired rounding options:
Period
, time gets rounded up/down/nearestTimeSpan
, start and end times are handled independently (with choice of up/down/nearest)TimeSpan
, start is rounded up/down/nearest and end is selected such that the span's duration is a set number of samples (based on the input span's duration + a rounding mode); this would yield spans of consistent duration (and would therefore address #20)In this new paradigm, current invocations of index_from_time(sample_rate, span)
would need to be updated to some flavor of
index_from_time(sample_rate, round_to_sample_time(sample_rate, span, mode)).
We could then provide a deprecation path to the current index_from_time
functions that falls back to a round_to_sample_time
that persists the current behavior.
From further discussion w @ericphanson, @kleinschmidt , one reason that we have to be careful with deprecation paths is that we use this all over the place when indexing into Onda.Samples
via spans rather than sample indices.
After some discussion with @hannahilea and @ericphanson, we concluded that we can handle this behavior with something like the "rounding mode" behavior of Base.round
, and that any default rounding mode is a major footgun for something as central to the Ondaverse as this is. One option would be to warn or throw an error when there's not an exact alignment with a span endpoint and sample times (the source of this behavior). That has two main disadvantages: it may slow down what should be a fast code path, and it will likely break between test cases and actual data "in the wild" (e.g. distributed jobs with large training batches, data from automated pipelines with varying sampling rates, etc.).
Another issue with a default rounding mode is that there isn't a rounding mode that is going to make sense or obey folks' intuitions in all cases. If you use the "intersect with any sample period" rounding mode (my proposal above), you will get spans of different lengths depending on whether you're exactly aligned with span times. If you round to a consistent number of samples, you will sometimes omit final (or initial) samples, so the interpretation of the timespan as an interval fails. Same (on both counts) for rounding up/down/in/out.
So it seems worthwhile to tradeoff some convenience (being able to do things like Samples[:, Timespan(...)]
) for being explicit about how non-exact span-segment alignment should be handled (e.g., requiring that you write Samples[:, TimeSpan(...); round=Down]
or samples[:, TimeSpan(...), Down]
)
Another issue with a default rounding mode is that there isn't a rounding mode that is going to make sense or obey folks' intuitions in all cases. If you use the "intersect with any sample period" rounding mode (my proposal above), you will get spans of different lengths depending on whether you're exactly aligned with span times. If you round to a consistent number of samples, you will sometimes omit final (or initial) samples, so the interpretation of the timespan as an interval fails. Same (on both counts) for rounding up/down/in/out.
💯
Yup, it's all about just picking whatever strategy lines up with your expectations about which invariants/properties you want to preserve. For simplicity's sake, TimeSpans just demands callers to align their expectations with the strategy it happens to implement 😅 not fun for callers who might desire a different interpretation but it is at least self-consistent
worth noting #2, in which case usages of TimeSpan
would be replaced in user code by Interval{<:Period,Closed,Open}
. I think if we wanted to solve this particular API problem for time <-> index conversion it'd be more worthwhile to do it atop Intervals than atop TimeSpans
I also wonder if more concretely baking the "interpretation choice" into these kinds of structures would also help, e.g. having callers use Period
ranges like Millisecond(start):Millisecond(period):Millisecond(stop)
and then indexing into/from those structures. or maybe it just punts the problem
I think the retitling doesn't quite capture everything here--- I see it as three issues that came up here:
The current implementation of index_from_time(sample_rate, span::TimeSpan)
makes an undocumented choice of semantics, which is to round-down the endpoints, and ensure the right endpoint of the span is excluded.
What do I mean this is undocumented? Well,
integer index of the most recent sample taken at
sample_time
documents the index_from_time(sample_rate, time)
method. That isn't sufficient to learn the semantics of the span
method, since we do something other than simply call index_from_time
on each endpoint, re
and we only document the span
method as
which leaves it unclear exactly what UnitRange
we're talking about.
The implementation of those semantics isn't correct when the right endpoint of the span doesn't fall exactly on a sample; we always throw away the right-endpoint even when it is well within the span
. @kleinschmidt's suggestion
I think the solution might be to convert the final index back to time, and see if it needs to be adjusted down by 1 (e.g., if time_from_index(rate, index_from_time(rate, stop(span))) == stop(span) then you use j-1, otherwise use j
would fix that, I think.
ref https://github.com/beacon-biosignals/TimeSpans.jl/issues/26#issuecomment-1016887476, https://github.com/beacon-biosignals/TimeSpans.jl/issues/26#issuecomment-1016914898
To add to those-- I think there's too much magic right now: index_from_time
is used internally by indexing into Onda.Samples
and other places, and it's not clear on that level what semantics you are getting when you do samples[:, span]
. (The Onda docs point to the tour, which doesn't discuss rounding). This indexing is convenient until something breaks and it turns out you would've been far better off carefully thinking about rounding semantics when you wrote the code originally. So I think we not round automatically, and either force the user to pass span
s whose endpoints are divisible by the sample rate (and provide helpers to construct these spans), or force them to pass a rounding mode.
I see it as three issues that came up here:
sounds like I've retitled this to issue 3, the other two can be filed separately?
The implementation of those semantics isn't correct when the right endpoint of the span doesn't fall exactly on a sample; we always throw away the right-endpoint even when it is well within the span. @kleinschmidt's suggestion
I think I'd need to consider this (or at least review associated PR a bit more deeply), haven't gotten a chance to thoroughly dig into it but i suspect there's a possibility that fixing the perceived bug might change the intended semantics (i.e. break existing tests). maybe not though and it really is just a simple bug that needs fixing
I think there's too much magic right now: index_from_time is used internally by indexing into Onda.Samples and other places, and it's not clear on that level what semantics you are getting when you do samples[:, span]. (The Onda docs point to the tour, which doesn't discuss rounding). This indexing is convenient until something breaks and it turns out you would've been far better off carefully thinking about rounding semantics when you wrote the code originally.
I think this mischaracterizes what the package is intended to provide...it shouldn't result in the class of bugs one would normally refer to as "magical" - it's not trying to guess a rounding mode for you and then guessing wrong. It's intended to only provide a single implementation that has one particular "rounding mode" baked in.
I think this is really the docs issue you brought up, because it seems like people ARE expecting TimeSpans to guess a rounding mode for their use case, which I agree is a bad expectation and is not a feasible thing to try to implement
I also agree that providing user-selectable rounding modes is a fine idea, but I'm not sure I want to put further development into TimeSpans.jl when the intention from the inception of the package was supposed to be to deprecate it in favor of Intervals (it was originally split out of Onda at the time to facilitate decoupling the interface). Though maybe the index <-> time conversion parts should live in their own package to compose with Intervals.jl. either way
it turns out you would've been far better off carefully thinking about rounding semantics when you wrote the code originally.
this is definitely true and should be true for anybody using TimeSpans today as-is, regardless of future changes
sounds like I've retitled this to issue 3, the other two can be filed separately?
Sounds good, will do.
I think this mischaracterizes what the package is intended to provide...it shouldn't result in the class of bugs one would normally refer to as "magical" - it's not trying to guess a rounding mode for you and then guessing wrong. It's intended to only provide a single implementation that has one particular "rounding mode" baked in.
Hm, the thing I'm calling "too magical" is samples[:, TimeSpan(Minute(1), Minute(10))]
(not index_from_time
directly). I think index_from_time
(at least with a period
, not a span
) is pretty clearly documented. But I think having samples[:, span]
has led to overly casual use (by me! but also by everyone AFAIK) where the rounding implications are only thought about later when something goes wrong.
I also agree that providing user-selectable rounding modes is a fine idea, but I'm not sure I want to put further development into TimeSpans.jl when the intention from the inception of the package was supposed to be to deprecate it in favor of Intervals (it was originally split out of Onda at the time to facilitate decoupling the interface). Though maybe the index <-> time conversion parts should live in their own package to compose with Intervals.jl. either way
To me, it doesn't really matter what package we use; we'll have the same questions about "how to ergonomically grab subsets of samples
corresponding to a given timespan, while handling index <-> time issues correctly for each use case". Agreed that index <-> time might end up being a separate package (or maybe part of Onda again?).
this is definitely true and should be true for anybody using TimeSpans today as-is, regardless of future changes
Yes totally, my point is I don't think that's not happening right now to the extent it needs to (and I am definitely as guilty of that as anyone else).
I think the bug that spawned this issue has been fixed as of #51
julia> TimeSpans.index_from_time(1, TimeSpan(Millisecond(1500), Millisecond(2500)))
2:3
Consider a signal with sample rate 1 Hz.
Now, consider the time span 1.5s to 2.5s. Using brackets to highlight this span:
What indicies should be associated to this span?
To me, looks like index 3.
But: