JuliaDynamics / TimeseriesSurrogates.jl

A Julia package for generating timeseries surrogates
https://juliadynamics.github.io/TimeseriesSurrogates.jl/stable/
Other
47 stars 9 forks source link

More padding modes for `RandomCascade` #120

Closed kahaaga closed 1 year ago

kahaaga commented 2 years ago

Is your feature request related to a problem? Please describe.

If the input signal length is not a power of 2, then the signal must be extended/padded in order for the discrete wavelet transform (Wavelets.dwt) to work for RandomCascade surrogates.

Currently, the only option is to pad with zeros at the end of the time series.

Describe the solution you'd like

We should add more padding options, like:

sathvikbhagavan commented 1 year ago

Hello, I was looking into this issue and implementing it. I had a few questions:

  1. For Linearly extrapolate end points, we would need timepoints as well for computing slope and that is not a passed into the surrogenerator function https://github.com/JuliaDynamics/TimeseriesSurrogates.jl/blob/main/src/methods/wavelet_based.jl#L291. How do you propose to get around this?
  2. For constant propagation of end points - wanted a quick clarification that end value is copied across the extra padding?
kahaaga commented 1 year ago

Hello, I was looking into this issue and implementing it. I had a few questions:

Excellent! We appreciate the help.

  1. For Linearly extrapolate end points, we would need timepoints as well for computing slope and that is not a passed into the surrogenerator function https://github.com/JuliaDynamics/TimeseriesSurrogates.jl/blob/main/src/methods/wavelet_based.jl#L291. How do you propose to get around this?

An implicit assumption for the RandomCascade surrogate is that the points are sampled on an even time grid. Therefore, you can just use any regular grid of points as the timepoints, for example 1:length(x), where x is the input timeseries. If you need to explicitly store these indices, you can just add them to the init named tuple at the end of the surrogenerator function, and then access them later like this:

function surrogenerator(x::AbstractVector{T}, method::RandomCascade, rng = Random.default_rng()) where T
     .... # a bunch of initialization code
     init = (
        wl = wl,
        c = c, 
        .... # add the timepoints to this tuple
    )

    return SurrogateGenerator(method, x, similar(x), init, rng)
end

Then, to access them when generating a surrogate, in the function below, you just do

function (sg::SurrogateGenerator{<:RandomCascade})()
    # more initialization code
    # ...
    timepoints = sg.init.timepoints

    # generate surrogate
    # ....
end
  1. For constant propagation of end points - wanted a quick clarification that end value is copied across the extra padding?

Yes, that was the initial idea I had. But if you're aware of other best practices when it comes to padding for wavelet analysis, feel free to do that instead.

As far as I can remember (but I haven't read the paper in a long time), they don't discuss the padding issue in the original paper, so we've got room to be creative here. Perhaps there are some padding modes that, when used with RandomCascade, are superior at preserving multifractal properties? The only way to know is to test how the resulting surrogates behave in some real-world problem!

My rambling aside: if you're not feeling that adventurous, you can just go for constant propagation of the last value, as you proposed :D