beacon-biosignals / TimeSpans.jl

A Julia package that provides a `TimeSpan` type for representing a continuous span between two points in time.
Other
6 stars 2 forks source link

useful timespan manipulation functions #10

Open ericphanson opened 3 years ago

ericphanson commented 3 years ago

In a Beacon-internal project, we've found this function useful:

function intersection_duration(ts1::TimeSpan, ts2::TimeSpan)
    duration = min(stop(ts1), stop(ts2)) - max(start(ts1), start(ts2))
    return max(duration, zero(duration))
end

Is it worth upstreaming here?

ericphanson commented 3 years ago

Another useful one:

"""
    invert_regions(existing_regions, overall_ts::TimeSpan)

Given a list of TimeSpan's, `existing_regions`, and some overall timespan `overall_ts`,
create a list of `TimeSpan`s of consecutive subregions of `overall_ts` which are not in
any of the `existing_regions`.

## Example
```julia
julia> new_regions = invert_regions([TimeSpan(1,2), TimeSpan(4,5)], TimeSpan(0,5))
2-element Vector{TimeSpan}:
 TimeSpan(00:00:00.000000000, 00:00:00.000000001)
 TimeSpan(00:00:00.000000002, 00:00:00.000000004)

julia> new_regions == [TimeSpan(0,1), TimeSpan(2,4)]
true

""" function invert_regions(existing_regions, overall_ts::TimeSpan) existing_regions = restrict_to_region(sort(existing_regions; by=start), overall_ts) new_regions = TimeSpan[]

# Do we have a gap at the beginning? If so, that's our first region
if start(existing_regions[1]) > start(overall_ts)
    push!(new_regions, TimeSpan(start(overall_ts), start(existing_regions[1])))
end

for i in 1:(length(existing_regions) - 1)
    l = stop(existing_regions[i])
    f = start(existing_regions[i + 1])
    # If there is a gap between region `i` and `i+1`, then that's a new region
    if l < f
        push!(new_regions, TimeSpan(l, f))
    end
end

# Is there a gap at the end? A final new region.
if stop(existing_regions[end]) < stop(overall_ts)
    push!(new_regions, TimeSpan(stop(existing_regions[end]), stop(overall_ts)))
end
return new_regions

end

omus commented 3 years ago

I'll note the intersect functionality already exists in Intervals.jl. We may want to tackle #2 before doing too much work here

ericphanson commented 3 years ago

Yeah, #2 is exactly why I haven't been PRing these. But we may need to address #2 pretty soon if we don't want the switch to be a big pain, bc our use of TimeSpans is growing internally.

haberdashPI commented 3 years ago

I ended up implementing a number of set operations (see #11) that I needed; they are easier to implement for TimeSpans since the operations are closed under [start, end) intervals: there's some extra work I did not bother to work out to make it work with arbitrary closed/open endpoints, which would make it work for all Intervals.jl.

omus commented 3 years ago

I ended up implementing a number of set operations (see #11) that I needed; they are easier to implement for TimeSpans since the operations are closed under [start, end) intervals: there's some extra work I did not bother to work out to make it work with arbitrary closed/open endpoints, which would make it work for all Intervals.jl.

I have to disagree with you here. As Intervals.jl specifies the endpoint attributes as part of the type parameter you could have made a PR to Intervals.jl which added the functionality you wanted by dispatching only on Interval{T,Closed,Open}. Much of what you implemented in #11 including set operations already exists in Intervals.jl so it could have been faster to implement the remaining features there. I would recommend attempting to make an Intervals.jl PR.

One big concern I have about #11 is that the more features we add to TimeSpans.jl the harder it becomes for us to switch.

haberdashPI commented 3 years ago

I do see the argument for moving #11 to Intervals.jl

One design issue I have with the approach in Intervals.jl: then we are working in a space where the closed/open nature of an interval is stored in the type of the interval. For general use closed/open intervals and operations over them, these would lead to type unstable code (and inefficient by-reference array storage). It seems like a better design would be to store the closed/open nature of the end point in a EndPoint struct as a boolean flag (and an interval is just a pair of EndPoint values). This also has an implementation advantage: arithmetic to operate over these end point values would be easy to define, and would allow straightforward "subtraction" of one interval from another.

The approach in Intervals.jl to set operations, last I checked, was less generic as well, and could only handle intersect (maybe that's changed? or maybe I'm missing something)

A possible all-of-the-above approach would be to move what I did in #11 over to a PR in Intervals.jl. Later on, if anyone ever needs generic set operations that are type stable, that could be defined with a new Interval{T,Unknown,Unknown} type that implements the approach described above. It would be nice though if there weren't two different representations of Open/Closed properties in the same package (since that would hopefully be an implementation detail).

omus commented 3 years ago

One design issue I have with the approach in Intervals.jl: then we are working in a space where the closed/open nature of an interval is stored in the type of the interval. For general use closed/open intervals and operations over them, these would lead to type unstable code (and inefficient by-reference array storage).

You are correct in that mixing closed/open intervals can lead to arrays using references rather than inlining. For Beacon's use case with TimeSpan as long as we're consistent and use the element type Interval{Nanosecond,Closed,Open} this won't be a factor (we can always use a type alias to make this less cumbersome).

It seems like a better design would be to store the closed/open nature of the end point in a EndPoint struct as a boolean flag (and an interval is just a pair of EndPoint values). This also has an implementation advantage: arithmetic to operate over these end point values would be easy to define, and would allow straightforward "subtraction" of one interval from another.

This is how Intervals.jl used to store the inclusive/exclusive information. However most use cases for arrays of intervals end up having the same kinds of endpoints and there are some decent memory savings for encoding the inclusivity as part of the type (for reference see: https://github.com/invenia/Intervals.jl/issues/98).

With that said having the inclusivity as part of the type parameter isn't great for mixed inclusivity arrays. So far there doesn't seem to be much of a desire to improve this but if there is we could always introduce an additional interval type that encodes these details this way which allows you to choose based upon your use case.

The approach in Intervals.jl to set operations, last I checked, was less generic as well, and could only handle intersect (maybe that's changed? or maybe I'm missing something)

I'm curious why you think this. There shouldn't be anything in Intervals.jl that prevents any set operations from being defined (I added isdisjoint here). I'll note most of the Base set operations can work on intervals without having to define additional methods. Usually a specialized method is added for performance reasons.

A possible all-of-the-above approach would be to move what I did in #11 over to a PR in Intervals.jl. Later on, if anyone ever needs generic set operations that are type stable, that could be defined with a new Interval{T,Unknown,Unknown} type that implements the approach described above. It would be nice though if there weren't two different representations of Open/Closed properties in the same package (since that would hopefully be an implementation detail).

I think opening PRs to Intervals.jl is definitely the right move. We can use the approach I show in https://github.com/beacon-biosignals/TimeSpans.jl/pull/14 to use Intervals.jl as the backend for TimeSpans.jl. This allows us to piggyback off of the existing code and may use of any calls dispatching using AbstractIntervals.

haberdashPI commented 3 years ago

Ah, thanks for that little history lesson 😄 . That makes sense. And yeah, even here, I don't actually have a situation where I need to implement the general operations over all types of closed/open intervals (though it does feel like a weird "hole" in the implementation).

I'll start moving #11 over to a PR for Intervals.jl, so that it can handle AbstractAray{<:AbstractInterval{T,Closed,Open}} and AbstractAray{<:AbstractInterval{T,Open,Closed}}.

As to:

I'm curious why you think this. There shouldn't be anything in Intervals.jl that prevents any set operations from being defined (I added isdisjoint here). I'll note most of the Base set operations can work on intervals without having to define additional methods. Usually a specialized method is added for performance reasons.

I don't mean to say it prevents it from happening, sorry if that was unclear. Looking back, I actually remembered this wrong, it was union not intersect that I was talking about, and specifically for general operations (where more than one interval is possible as an input/output). All I mean is that the mergesets function in #11 (which was really the bulk of the work for that PR) is more generic in some ways than the union function in Intervals.jl (but wouldn't handle anything that doesn't work like arrays of all Interval{T,Closed,Open} or all Interval{T,Open,Closed} without some more work).

omus commented 3 years ago

I'll start moving #11 over to a PR for Intervals.jl, so that it can handle AbstractAray{<:AbstractInterval{T,Closed,Open}} and AbstractAray{<:AbstractInterval{T,Open,Closed}}.

That would be great! I would definitely by limiting the inclusivity of your functions to open/closed intervals as it's much easier to reason about. With a little bit of refactoring you'll probably be able to support all inclusivities :)

All I mean is that the mergesets function in #11 (which was really the bulk of the work for that PR) is more generic in some ways than the union function in Intervals.jl (but wouldn't handle anything that doesn't work like arrays of all Interval{T,Closed,Open} or all Interval{T,Open,Closed} without some more work).

If you do find yourself wanting mergesets in Intervals.jl I think we can start off by supporting a limited in scope version and expand the functionality over time. I wouldn't start with this though.

haberdashPI commented 3 years ago

Cool, okay, I think we're on the same page here: my understanding is that I'll port over #11, but only support the intervals the current implementation already naturally supports, not try to expand the functionality any further.

haberdashPI commented 2 years ago

FYI: there's now an in progress version of this in intervals at https://github.com/invenia/Intervals.jl/pull/179