JuliaDynamics / TransitionsInTimeseries.jl

Transition Indicators / Early Warning Signals / Regime Shifts / Change Point Detection
MIT License
18 stars 5 forks source link

Handle unevenly sampled time series #32

Open JanJereczek opened 1 year ago

JanJereczek commented 1 year ago

Is your feature request related to a problem? Please describe.

Empirical data are often unevenly sampled. In the state of the art, an interpolation is often performed but this might bias the result.

In TransitionIndicators.jl we want to offer both options: (1) the user can interpolate their data and pass it or (2) directly pass their unevenly sampled time series, in which case no interpolation happens and all the difficulties are handled internally. (Option 2) might have reduced features compared to (Option 1) but introduces no error.

We won't focus too much on (Option 2) for now but defining the basics for the API to handle it should be a step in near future.

Describe the solution you'd like

Here goes an idea of solution. It does not rely on defining an iterator (maybe a downside?) but has a quite sparse syntax:

using TransitionIndicators

abstract type TimeSeries end

struct EvenspacedTS <: TimeSeries
    nt::Int64
    t::AbstractVector
    spacedim::Int64
    x::AbstractArray
    dt::Real
end

struct UnevenspacedTS <: TimeSeries
    nt::Int64
    t::AbstractVector
    spacedim::Int64
    x::AbstractArray
end

function TimeSeries(t, x)
    nt = length(t)
    spacedim = ndims(x) - 1
    if isequispaced(t)
        dt = mean(diff(t))
        return EvenspacedTS(nt, t, spacedim, x, dt)
    else
        println(
            "\n" *
            " Unevenly spaced time series are work in progress. " * 
            "If you spot bugs, please open an issue on GitHub!\n " * 
            "Bare in mind that unevenly spaced time series come with limited functionalities! \n"
        )
        return UnevenspacedTS(nt, t, spacedim, x)
    end
end

struct WindowView
    width::Int
    stride::Int
    windows::Vector{UnitRange{Int}}
end

function window_view(x, wv::WindowView)
    return [view(x, window) for window in wv.windows]
end

function WindowView(ts::EvenspacedTS; width::Int = default_window_width(ts), stride::Int = default_window_stride(ts))
    si = width:stride:ts.nt
    windows = [k-width+1:k for k in si]
    return WindowView(width, stride, windows)
end
default_window_width(ts::EvenspacedTS) = ts.nt ÷ 100
default_window_stride(ts::EvenspacedTS) = 1

function WindowView(ts::UnevenspacedTS; twidth::Real = default_window_width(ts), stride::Int = default_window_stride(ts))
    ibegin = find_nearest(twidth, ts.t)
    if ibegin < 1
        ibegin = 1
    end
    windows = [findspan(t, twidth, ts.t) for t in ts.t[ibegin:stride:end]]
    return WindowView(0, stride, windows)
end
default_window_width(ts::UnevenspacedTS) = last(ts.t) / 100
default_window_stride(ts::UnevenspacedTS) = 1

function findspan(t, twidth, t_uneven)
    i1 = find_nearest(t-twidth, t_uneven)
    i2 = find_nearest(t, t_uneven)
    if i1 == i2
        error("Time series not dense enough before t=$t for the chosen window width")
    else
        return i1:i2
    end
end

function find_nearest(t, tvec::AbstractVector)
    return argmin((tvec .- t).^2)
end

t = 0:0.1:100
x = sin.(t)
ts = TimeSeries(t, x)
wv = WindowView(ts)
y = zeros(eltype(x), length(wv.windows))
@btime map!(var, y, window_view(x, wv))

tu = cumsum(0.2 .* rand(length(t)))
xu = sin.(t)
tsu = TimeSeries(tu, xu)
wvu = WindowView(tsu)
yu = zeros(eltype(x), length(wvu.windows))
@btime map!(var, yu, window_view(xu, wvu))

Describe alternatives you've considered

Additional context

One open question is whether the window width on unevenly time series should be defined by the number of points or by a time span. I think the latter one is more rigorous.

Datseris commented 1 year ago

Oof, i've thought about this extensively but I haven't written anything here. Sorry, it would have saved you a lot of effort...

From my eyes, the plan is to create a package TimeseriesViewers.jl that defines a Timeseries type and then has on top of it the viewers, e.g. windowed viewers or sub-scale viewers.

Timeseries is an AbstractVector, a wrapper, that wraps a vector and a time vector. It has two concrete subtypes, an evenly and unevenly sampeld. The goal is to allow to ways to access Timeseries: by index, or by time. We can rely on Interpolations.jl to access the unevenly sampled version via interpolating.

The WindowViewers then produce views over timeseries with a given window an stride, in real time units, or integer units for convenience if the timeseris is evenly sampled.

The main takeaway here is that TransitionIndicators.jl will not care at all whether the timeseries is evenly sampled or not, because it gets in a Timeseries and a window viewer, with windows always defined in real time units. Thus, TransitionIndicators.jl works as it does now being agnostic to the nature of the timeseries. I think this is the best design.

Datseris commented 1 year ago

it is unfortunate that you went though a lot of time to write your code, but I really think the best way forwards is to just rely on Interpolations.jl for creating windows of an unevenly sampled timeseries...

JanJereczek commented 1 year ago

From a design perspective I agree with you. This would be the easiest. However, one might want to avoid interpolation in some cases, as it biases e.g. the AR1-regression coefficient and the power density spectrum.

Going the way you propose means closing that option, but it might be a choice we want to make. Let's talk about it live at some point.

Datseris commented 1 year ago

However, one might want to avoid interpolation in some cases, as it biases e.g. the AR1-regression coefficient and the power density spectrum

Not sure what you mean here. Avoiding interpolation when? The user can choose to not construct the interpolation object, and rather just given in an abstract vector directly, in which case no interpolation would/coould happen.

JanJereczek commented 1 year ago

We can rely on Interpolations.jl to access the unevenly sampled version via interpolating.

I am referring to this sentence. I understood that you (always) want to access unevenly-spaced time series by interpolations. But maybe that was not your point. Let's talk about it live to make sure we understand each other.