beacon-biosignals / Onda.jl

A Julia package for high-throughput manipulation of structured signal data across arbitrary domain-specific encodings, file formats and storage layers
Other
67 stars 5 forks source link

confusion w.r.t. index-time conversion and `Signal` time range vs. `Samples` time range #34

Closed SimonDanisch closed 1 year ago

SimonDanisch commented 4 years ago

I'm not sure what the expected behaviour for indexing into a Samples object with an start time offset is, but after skimming the tests it doesn't seem to be tested...

In this line, it seems like index_from_time doesn't account for start offsets https://github.com/beacon-biosignals/Onda.jl/blob/master/src/samples.jl#L75 So indexing a sample a la:


sample_rate_hz = 10.0
duration_ns = Nanosecond(24*60*60*1e9)
offset = Nanosecond(5.0*1e9)
duration_s = (duration_ns - offset).value / 1e9

nsamples = round(Int, duration_s * sample_rate_hz)
channels = [:spikes, :threshold_spikes]
signal = Signal(channels, offset, duration_ns, :probability,
                0.01, 0.0, Int16, sample_rate_hz, Symbol("lpcm.zst"), nothing)

data = zeros(length(channels), nsamples)
probability_samples = Samples(signal, false, data)
#---------------------------------------------------
# No error:
probability_samples[1, Nanosecond(0)]
# Error:
probability_samples[1, duration_ns - Nanosecond(1)]

I'm guessing this isn't the intended behavior?

jrevels commented 4 years ago

I'm guessing [the MWE] isn't the intended behavior?

Seems like the intended behavior, but I might be missing something?

signal = Signal(channels, offset, duration_ns, :probability, 0.01, 0.0, Int16, sample_rate_hz, Symbol("lpcm.zst"), nothing)

I'm not sure if this is an actual source of confusion or not, but keep in mind that the relevant fields here are the start/stop nanoseconds (as they're named and defined in the Onda format specification, so the stop nanosecond is exclusive), not start and duration. Additionally see Onda's index-from-time conversion mechanism documented/tested/exported as https://beacon-biosignals.github.io/Onda.jl/stable/#Onda.index_from_time.

A few other "boundary case" examples that might be helpful (these cases are already tested IIRC, but could also add them to examples/tour.jl):

# following the Onda documentation, we should expect this to error - and it does!
julia> probability_samples[1, duration(probability_samples)];

# a nice way to get the duration of a single sample
julia> single_sample_duration = time_from_index(sample_rate_hz, 2);

# returns the last sample
julia> probability_samples[1, duration(probability_samples) - single_sample_duration];

# TimeSpans are conveniently end-exclusive by construction
julia> probability_samples[1, TimeSpan(Nanosecond(0), duration(probability_samples))];

I'm not sure what the expected behaviour for indexing into a Samples object with an start time offset is

Samples indexing behavior used to be documented in the Samples docstring, but is now documented in the much more comprehensive (and tested) examples/tour.jl:

Note that `getindex` and `view` are defined on `Samples` to accept normal integer
indices, but also accept channel names for row indices and [`TimeSpan`](@ref)
values for column indices; see `Onda/examples/tour.jl` for a comprehensive
set of indexing examples.

This behavior hasn't changed since Onda's initial public release (i.e. before we added signal-specific time ranges) - indices/offsets are relative to the start of the Samples data itself, there's no other source involved IIRC. We actually tried out many variations of this before Onda's public release; we started with the "Samples track their own origin" approach but moved away from it after a while, which simplified a lot of code.

We also have warnings like the following in a few places, both in docstrings and in the examples:

!!! warning
    `duration(samples)` is not generally equivalent to `duration(samples.signal)`;
    the former is the duration of the entire original signal in the context of its
    parent recording, whereas the latter is the actual duration of `samples.data`
    given `samples.signal.sample_rate`.

We could improve documentation in this vein would by adding a few lines/tests here indexing into spo2 to demonstrate that there really is no hidden offset behavior w.r.t. the underlying signal.

I'll consider those examples as the "action item" for this issue, if that's sufficient to resolve the concern here (let me know if not or if I missed something).

SimonDanisch commented 4 years ago

I'm not sure if this is an actual source of confusion or not, but keep in mind that the relevant fields here are the start/stop nanoseconds

Yeah, no ;) I guess the only confusion was, that I expected it to behave like an OffsetArray ;) Let me read the docs a bit more careful, before further bothering you :D

jrevels commented 1 year ago

closing as stale