beacon-biosignals / Onda.jl

A Julia package for high-throughput manipulation of structured signal data across arbitrary domain-specific encodings, file formats and storage layers
Other
67 stars 5 forks source link

deprecate `Onda.Samples` in favor of interoperability with generic array packages? #77

Open jrevels opened 3 years ago

jrevels commented 3 years ago

a secret window into the mysterious caverns of the Beacon Slack:

lol

It always feels like there's lots of overlap between Samples and more generic named-array packages like AxisKeys/AxisArrays/etc. Would it be a good idea to refactor Samples to be a wrapper around AxisKeys, or define convenience constructors for translating between them, or replace Samples entirely, or something else altogether that allows Onda (and downstream callers) to more easily reuse/interface the more generic functionality provided by AxisKeys?

It seems like answering this question is a matter of clearly stating the "responsibility" of the Samples API layer and possibly formulating our goals around that.

To that end, it might be useful to start by roughly describing the manner in which Onda's functionality is currently layered:

  1. Tabular recording metadata utilities (Arrow.jl/Tables.jl stuff)
  2. (built on layer 1) Overloadable sample data (de)serialization mechanisms (the AbstractLPCMFormat API)
  3. (built on composition of layer 2 + 3) The Samples API.

The current core responsibilities of the Samples API layer are...

  1. ...to provide a minimal "load/store unit" for Onda-formatted sample data (in practice - associates SampleInfo with a corresponding sample data matrix)
  2. ...to provide LPCM encode/decode functionality. "Why here instead of the AbstractLPCMFormat layer?", you might ask. The answer is that the Samples layer is the "lowermost" layer at which encode/decode can be implemented generically w/o knowing anything about the target/source serialization format.
  3. ...to provide an overload point for any other clear/obvious specialized functionality that arises from associating SampleInfo with a data matrix, like specialized time span/channel indexing.

Given all of these points in combination, it seems like we can at least say we won't be able to get rid of Samples fully, unless a) we feel like everything we'd ever want out of point 3 could be achieved already with a more generic array type and b) we'd feel fine forcing callers to pass around data and info into most API functions as separate arguments (and having callers keep track on their own of whether or not data is encoded/decoded).

So, assuming we wouldn't don't get rid of Samples entirely, how could we use these other packages to make Samples better? It seems like the primary annoyance that they might help with is that Samples not an AbstractMatrix despite point 3. It's like telling callers "yeah, Samples isn't an AbstractMatrix, but here, if you want special matrix-y indexing features, wrap your AbstractMatrix in it!" Sounds weirdly inconvenient, but the reasoning kind of makes compositional sense - if you DID implement it as a full AbstractMatrix, then callers would still probably have to assume (outside a few special cases that are already covered) that most AbstractArray operations would cause Samples inputs/outputs to be unwrapped anyway (i.e. the transform on the Samples data would not have a corresponding sensible transform on its SamplesInfo). Thus, we just keep the distinction very clear/explicit, forcing callers to unwrap/rewrap themselves, so that they don't land in weird situations where they accidentally lose SamplesInfo along the way in an unexpected manner.

Does all of this imply that Onda should just provide explicit constructors to go between KeyedArray and Samples and be done with it? That seems like an easy enough thing to do. This brainstorming does make me want to play around with a Samples-less API (which would be beholden to a) and b) mentioned above), just to see how far it can go...

ericphanson commented 3 years ago

https://github.com/JuliaAudio/SampledSignals.jl also deals with multichannel signals and implements array type methods, so could be worth a look to see if we want to borrow any of that design.

jrevels commented 1 year ago

xref

jrevels commented 3 months ago

relevant to this topic - it's worth noting that by this point, the OSS ecosystem that emerged from the geoscience space (xarray + zarr + kerchunk) basically solves most of the same problems as Onda's sample data manipulation functionality, but in a drastically more generalized domain-agnostic fashion (n-dimensional labeled-dimension array storage, pluggable file formats/codecs, pluggable storage systems)

If these tools had already existed way back in early 2019, Onda.jl wouldn't have consisted of anything more than the signal/annotation schemas themselves

This is probably an indication that the direction outlined by this issue is the right eventual path in theory