ezmsg-org / ezmsg

Pure-Python DAG-based high-performance SHM-backed pub-sub and multi-processing pattern
https://ezmsg.readthedocs.io/en/latest/
MIT License
9 stars 4 forks source link

Add Slicer Unit + tests using generator pattern #83

Closed cboulay closed 4 months ago

cboulay commented 5 months ago

Once again, this first commit is the same as #79

cboulay commented 5 months ago

The selection argument wants a string which could be any of the following formats:

I suppose we could add support for:

But, before we get too crazy, I'd like to get some feedback.

cboulay commented 5 months ago

Probably shouldn't be merged as-is because it uses gen_to_unit. See #85 .

griffinmilsap commented 5 months ago

I think I'd be happier with this contribution if the slice syntax was retired in lieu of accepting a slice or an int. If an int is given, we should have an option to collapse the dimension from the AxisArray (and associated axes). How tied are you to the string syntax?

cboulay commented 5 months ago

I think we could drop string support if we could also allow an iterable of units and/or slices. e.g. [0, 3, slice(10, 64)]. It would be up to the Unit to convert that into a tuple of ints.

Agreed that a single int should collapse that dimension.

griffinmilsap commented 5 months ago

What about:

@dataclass
class SliceInfo:
    value: Union[int, float, slice]
    axis: Optional[str] = None
    by_index: bool = True

@consumer
def slicer(
    selection: List[SliceInfo]
) -> Generator[AxisArray, AxisArray, None]:
...

Then you could do:

slicer([
    SliceInfo(0, axis = 'ch'),
    SliceInfo(3, axis = 'ch'),
    SliceInfo(slice(10,64), axis = 'ch'),
    SliceInfo(slice(5.5, 15.5), axis = 'freq', by_index = False)
])

What'd be REALLY nice would be something like

slicer([
    SliceInfo(ch = 0),
    SliceInfo(ch = 3),
    SliceInfo(ch = slice(10,64)),
    SliceInfo(freq = slice(5.5, 15.5), by_index = False)
])
cboulay commented 5 months ago

Given that these are arguments that will be set in entry-point scripts by users of varying degrees of experience, it would be nice to keep the arguments as simple as possible.

I think if someone wanted to slice two different axes then that could be two different nodes in the graph, so we don't need to provide axis='' to each item in the list.

That leaves by_index=False; is that worth the extra overhead of having to import SliceInfo and making the argument syntax 2-10x longer? Honestly, it might be. It's pretty useful to be able to slice a frequency range knowing only the cutoffs.

Quick side note, the SliceInfo(ch=3) syntax is almost too good; it might be misleading if channels are labeled as '1', '2', '3' etc.

griffinmilsap commented 5 months ago

Fair enough. I'm reminded of the little string syntax we ask users to input to the print dialog when they want to print pages 4, 7, and 34-45, maybe the string syntax is the best way forward.

cboulay commented 4 months ago

Rebased and made it so a single int drops the indexed dim.

I decided to hold off on supporting "start-end" syntax for now because then I would need to drop support for negative indices, either that or level-up my regex-fu.

griffinmilsap commented 4 months ago

Going to accept this as an experimental new feature with the understanding that it might change around a bit as we go.