ezmsg-org / ezmsg

Pure-Python DAG-based high-performance SHM-backed pub-sub and multi-processing pattern
https://ezmsg.readthedocs.io/en/latest/
MIT License
9 stars 4 forks source link

sigproc optimization #127

Closed cboulay closed 3 weeks ago

cboulay commented 4 weeks ago

Note that the first 2 commits in this PR are already in other PRs in ezmsg-org/ezmsg#123 and ezmsg-org/ezmsg#127. I included them here because there would be conflicts otherwise. If those 2 PRs get merged then this PR will be a little easier to parse.

What the 3rd commit does:

Following my comments in ezmsg-org/ezmsg-sigproc#7 , I went through all of the sigproc modules and refactored them to create new msg objects because it turns out that shallow-copy + modification is faster than replace. This is shown explicitly in the new notebook added to extensions/tests/resources/.

Then, confident that we're now making proper copies and not mutating shared objects, I turned on zero_copy=True for as many of these Units as I could.

cboulay commented 4 weeks ago

Maybe worth mentioning, I decided to use new_msg = copy.copy(old_msg) instead of new_cls = type(old_msg); new_msg = new_cls(old_msg.__dict__) despite the former being a bit slower, because it's more readable and somehow I think the use of __dict__ is more fragile though I don't have anything to back that up.

cboulay commented 3 weeks ago

I just saw some weird behaviour when trying to use this approach to optimize the BandPower node (which isn't included in this PR, but its dependencies are). I'm modifying this to a draft while I add some more unit tests and track down the source of the problem.

cboulay commented 3 weeks ago

The weird behaviour was that I attempted to override a parent method when doing @ez.subscriber(OUTPUT_SIGNAL, zero_copy=True) but I misspelled the function name and ended up double-processing the data. I've since solved this in my cboulay/working branch. I'll rebase and redo the commit here in the near future.

griffinmilsap commented 3 weeks ago

Another thought; shallow copy with modification of the dataclass may be faster than replace, but replace calls the constructor, and importantly, the __post_init__ method of a dataclass which is important for some of my messages. If the speedup is that substantial, we should consider whether its worth losing that expected (?) behavior

cboulay commented 3 weeks ago

Yeah the gains aren't that big in practice, at least not on the scale I'm working on so far. I am looking forward to 100x more throughput than I'm doing currently though but maybe I'll need custom CUDA nodes for that.

I'll close this PR and open new ones just for the zero_copy=True changes and some unit tests to check that the input msg has not been modified relative to a deepcopy backup.

BTW, do you need to use replace on the axis objects too, or just the message itself? i.e., Is the following OK?

new_axes = {**old_axes, "time": TimeAxis(fs=fs, offset=new_offset)}
replace(in_msg, data=new_data, axes=new_axes)

Or do you need

new_axes = {**old_axes, replace(old_axes["time"], offset=new_offset)}
replace(in_msg, data=new_data, axes=new_axes)

?

griffinmilsap commented 3 weeks ago

I'm not doing anything weird with the axes at this point, but I could see implementing custom axes in the future. It might be worthwhile to do the latter over the former.