ezmsg-org / ezmsg

Pure-Python DAG-based high-performance SHM-backed pub-sub and multi-processing pattern
https://ezmsg.readthedocs.io/en/latest/
MIT License
9 stars 4 forks source link

`windowing` generator now yields AxisArray. #103

Closed cboulay closed 1 month ago

cboulay commented 4 months ago

Fixes #100

The windowing generator now always yields an AxisArray.

Previously, the return type was List[AxisArray]. Depending on the configuration, the list might have always been length 1 (in 1:1 mode) or usually 1, unless a big chunk came in and multiple windows were available. Now, there is always a newaxis added ("win" by default) and that will usually have length 1, but sometimes more.

The "win" axis might also have length 0 when the most recent send was inadequate to fill the next window. Previously, an unfilled .send would yield None (previous yield type incorrectly failed to include Optional[...]). This is probably the biggest change here and might merit some discussion: What should generators yield when the .send does not provide enough data to yield a result? None, an empty AxisArray, or an AxisArray with most details correct except one dimension is 0 and msg.data.size==0?

The Unit still has the option for the setting newaxis=None, in which case the value yielded by the generator is iterated over the "win" axis. In cases where the "win" axis is length 0, nothing is iterated (i.e., nothing published).

I had to make a change to the generator-level tests to drop yielded msgs with size 0, whereas previously I was dropping yielded msgs that were None.

The system-level tests are unchanged and continue to pass. If you have a use case that doesn't work with this PR then please create a test for that and I will try to figure out a solution.

cboulay commented 4 months ago

Oops. I did't meant to base on main. I'll let the tests complete then I'll rebase on dev. Edit: Done.

cboulay commented 3 months ago

@griffinmilsap , last time I changed windowing it broke one of your pipelines. Can you please check this one?

griffinmilsap commented 2 months ago

Window is such a beast. Easily one of our most complicated and heavily used additions to the signal processing library. I can say that depending on where this win dimension is placed, this will probably break some of my existing work, especially where (by happenstance) some of my pipelines actually already rely on a win dimension. How would you feel about changing win to something like _new_axis?

The idea of 0-length dimensions in AxisArray is also something I never really considered. It looks like the Unitization of this generator drops 0-length messages so this change won't result in higher messaging overhead for no reason, so that's good. In general, I'm in favor of retiring Optional data types because it makes linting easier and we don't have to check for is None later.

@pperanich do we rely on ezmsg.sigproc.window, or its generator implementation in any of the HLM or OI work? If not, given that this change only effects the generator, I'm for merging this if we can address the issue above.

As always, @cboulay thank you for the contributions; these are incredibly valuable and I apologize it took so long to get to this one.

cboulay commented 2 months ago

(rebased)

@griffinmilsap , I chose "win" because I thought that's what you were using but I thought it was scoped to exactly using a windowing node. I didn't think about what would happen if you applied the Window unit to an AxisArray that already had a "win" axis.

That is, if you tell the Window Unit that you don't want a newaxis and the generator inserts the "win" axis anyway for its own internal bookkeeping, it will be confused by the existing "win" axis.

Sorry about that. I'm happy to rename the default bookkeeping axis to something else. How is "_windowing"? Ultimately it only matters for direct users of the generator, as in some offline pipelines.

Reminder to myself: Modify my offline pipelines to be explicit about the newaxis="win" argument.

cboulay commented 2 months ago

I accidentally had an unrelated commit sneak into this PR. It's similar in that it uses size-0 arrays in offline pipelines, but it's for a different processing step. I'll try to separate them. Edit: It's been removed and now exists in another commit (soon PR).

griffinmilsap commented 1 month ago

I think its time to merge this and if I run into issues on dev, I've got time to react to them :)