ezmsg-org / ezmsg

Pure-Python DAG-based high-performance SHM-backed pub-sub and multi-processing pattern
https://ezmsg.readthedocs.io/en/latest/
MIT License
11 stars 5 forks source link

Bug: sigproc.Window incorrect offset assignment #60

Closed cboulay closed 8 months ago

cboulay commented 10 months ago

Once again, I'm not certain but I think TimeAxis offset refers to the OLDEST sample in a chunk. If so, then I think Window is using the incoming data's offset to update the buffer offset incorrectly.

https://github.com/iscoe/ezmsg/blob/43fef1835e6a7fadba65a65e9b6dba83a30d4245/extensions/ezmsg-sigproc/ezmsg/sigproc/window.py#L93-L95

I'm not sure how to put the above 3 lines into words because I find it a bit confusing. The first step creates a timestamp vector that's the combined length of the buffer and the incoming data, even though the buffer already has the incoming data concatenated onto it, so the result is too long. From there I'm lost. Though I'm lost, I've been using the .offset to track samples through my pipeline to measure performance and there is definitely something wrong with the offset calculations in Window.

I find the following alternative to be much simpler, because it keeps buffer_offset the same length as self.STATE.buffer:

        buffer_offset = np.arange(self.STATE.buffer.shape[0])
        buffer_offset -= buffer_offset[-time_view.shape[0]]  # Adjust so first new sample = 0
        buffer_offset = (buffer_offset * axis.gain) + axis.offset

This change certainly helps my offset tracking, though it's still not perfect.

cboulay commented 10 months ago

One issue that I ran into when I was reworking some of the Window code is that replace doesn't make a copy of the field-value if it's a dictionary, at least it seems that way.

So, in the following lines:

https://github.com/iscoe/ezmsg/blob/43fef1835e6a7fadba65a65e9b6dba83a30d4245/extensions/ezmsg-sigproc/ezmsg/sigproc/window.py#L140-L142

If out_axes, or indeed msg.axes is modified even after the output has been yielded, that modification will affect the long-gone message as it's being processed by subsequent Units. I'm not sure if it's an issue in the current implementation, but it certainly bit me when I was trying to refactor this part to correct some bugs.

I've tried to demonstrate the problem below

from dataclasses import replace
import numpy as np
from ezmsg.util.messages.axisarray import AxisArray

msg = AxisArray(
    data=np.arange(6).reshape((3, 2)),
    dims=["ch", "time"],
    axes={"ch": AxisArray.Axis(), "time": AxisArray.Axis(unit="s", gain=1/10, offset=-0.1)}
)
axis = msg.get_axis("time")
out_axes = msg.axes

out_axes["time"] = replace(axis, offset=0.3)
out_axes["time"] is msg.axes["time"]  # True

msg_2 = replace(msg, axes={"ch": msg.axes["ch"], "time": out_axes["time"]})

msg_2.axes is msg.axes  # False
msg_2.axes["ch"] is msg.axes["ch"]  # True
msg_2.axes["time"] is msg.axes["time"]  # True

print(msg.axes)
# {'ch': AxisArray.Axis(unit='', gain=1.0, offset=0.0), 'time': AxisArray.Axis(unit='s', gain=0.1, offset=0.3)}

(Note: PR with unit test incoming shortly)