ezmsg-org / ezmsg

Pure-Python DAG-based high-performance SHM-backed pub-sub and multi-processing pattern
https://ezmsg.readthedocs.io/en/latest/
MIT License
14 stars 6 forks source link

[REQ] Merge(ez.Unit) to concatenate two streams #147

Open cboulay opened 1 month ago

cboulay commented 1 month ago

It should be possible to concatenate 2+ AxisArray messages that share all dimensions except the concatenated dim. The concatenated dim can be a new axis. If the non-concatenated dims have an associated AxisInfo then these will need to be aligned within some tolerance; this means the Unit needs to maintain some internal buffer for each input and potentially drop samples that can no longer be aligned.

MergeSettings(ez.Settings):
    axis: str
    tol: float = 0.01

A little more practically, this will almost always be used to concat along "ch" or "features" axis from two different branches of a pipeline that underwent different signal processing steps but ultimately were resampled to the same sampling rate. The Unit will keep a buffer of stream_A and stream_B. Likely the only other axis with info in .axes is "time", so sample times from stream_A will be realigned with sample times from stream_B, within tol. Any samples that can't be aligned because they are too old will be dropped. Any samples that can't be aligned because they are too new will stay in the buffer. The samples that are aligned are concatenated together along "axis" and returned.

cboulay commented 1 month ago

Which of these designs is preferred?

  1. We could have two InputStreams -- INPUT_SIGNAL_1 and INPUT_SIGNAL_2 -- and be limited to 2-way merging. Users can daisy chain merges if they need more.

  2. Now that AxisArray has a .key field, we could maintain a buffer for each seen .key and only merge when there is alignment contribution from every buffer (min 2) -- in this way we could use a single InputStream INPUT_SIGNAL and have N-way merging.

I think ultimately the core logic (generator or otherwise) must use the design in 2, but the Unit can still use the design in 1 and simply do some key-manipulation before sending and after receiving.

cboulay commented 1 month ago

Maybe this should go in ezmsg-sigproc. I can't imagine a scenario when someone would want to use this yet somehow aren't also using ezmsg-sigproc. Thoughts?

griffinmilsap commented 14 hours ago

Def belongs in sigproc, but +10 for this.

I'd love to continue our XArray pattern where an AxisArray ~= DataArray and we could have some analog of a DataSet with multiple variables in it that have shared coords.

I think supporting this would involve some basic coord support for AxisArray, and THAT PR would live in this repo.