SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
524 stars 187 forks source link

Inconsistent handling of time offsets in plot_traces #3324

Open jonahpearl opened 2 months ago

jonahpearl commented 2 months ago

Hi all — I was doing some manual digging around to deal with artifacts caused by some custom lighting in our rigs, and I wanted to find the moment in the recording where the artifacts began. I noticed that depending on whether I used recording.time_slice() or recording.plot_traces, there are two different behaviors for dealing with the case where t0 != 0 (this is coming from open ephys, so it's often the case that the first timestamp isn't 0 in the recording).

Case 1: the "give me X many seconds into the recording" strategy. This appears to be used by recording.plot_traces, and I think is a bug.

Case 2: the "give me the data where the timestamps equal this value" strategy. This appears to be used by recording.time_slice, and I think is the correct way (?).

For example, here is the plot from plot_traces for the time range (25.5, 26.5): you can see the artifacts start ~halfway through the window.

image

Now if I run this code to try to show the same thing but with recording.time_slice(), we instead get a different moment:

tmp_rec = recording_car.time_slice(25.5, 26.5)
t = tmp_rec.get_traces(channel_ids=['CH40'])
plt.figure(figsize=(10,3))
plt.plot(t, lw=0.5)

image

If we adjust the time slicing to account for the inconsistent behavior, we recover the same moment in the data:

t0 = recording_car.get_times()[0]
tmp_rec = recording_car.time_slice(25.5 + t0, 26.5 + t0)  # this will get data from where the timestamp equals 25.5 + t0
t = tmp_rec.get_traces(channel_ids=['CH40'])
plt.figure(figsize=(10,3))
plt.plot(t, lw=0.5)

image

You can see the behavior in the code. time_slice ultimately relies on this logic (at least in my case, it seems that self.t_start is assigned automatically when reading from OE folders): sample_index = (time_s - self.t_start) * self.sampling_frequency, which means that, say, if t0 were 10, and the user requested t=20, the code would correctly give the user the data from 10 seconds into the recording.

However the plot/get_traces functions don't appear to do this correction, as seen here and down on L145 below that, and then the helper function just inherits that time range directly.

Hopefully fixing the behavior in plot_traces is easy enough and won't hurt anyone's workflow — I imagine that may be why the bug still exists :)

JoeZiminski commented 2 months ago

Hey @jonahpearl thanks for this, so to be sure I understand correctly:

1) plot_traces() is getting the times 25.5 - 25.6 into the recording, irregardless of what the actual start time is. 2) the expected behaviour is that plot_traces() displays the actual 25.5 - 25.6 second range. So for example, if the recording was from ground-truth times 20 to 50 seconds, you would expect the time slice you used (25.5, 25.6) to display 5.5 to 5.6 seconds into the recording (25.5-25.6 true time). But, it is giving 25.5 to 25.6 seconds into the recording (which is 45.5 to 45.6 seconds true time).

I think this is because the plot_traces() code you linked is looking for a time_vector but ignoring any t_start set on the recording. Presumably if you do recording_car.has_time_vector() you will get False? Could you do recording_cmr.set_times(recording_cmr.get_times()) and check if that fixes the problem? (otherwise I've misunderstood something).

Now, this conditional is outdated as get_times() will generate times on the fly even if no time vector is found. This is better as it will use any t_start on the recording. I think conditional block in plot_traces linked above can now be replaced with just the second part of the conditional:

times = rec0.get_times(segment_index=segment_index)
t_start = times[0]
t_end = times[-1]

@alejoe91 @samuelgarcia do you agree? In practice, can we deprecate has_time_vector() as one is always generated now, and it will use t_start if available whereas legacy code that uses this conditional check may not?

jonahpearl commented 2 months ago

Hi @JoeZiminski — yes, running recording_cmr.set_times(recording_cmr.get_times()) aligns the two behaviors, as you expected :) thanks!

One other note, if I run that fix, and then "naively" ask for a time segment outside the start of the recording with the plot_traces function, I get a fairly cryptic ValueError because it's trying to run operations on an empty array, it might be worth checking for this and raising a user-readable error, since I imagine some people will just instinctively try to plot the first few seconds of their recording. recording.time_slice() gives an AssertionError which is clear from context if you look at the traceback, but it wouldn't be hard to add a simple message giving the user feedback.

ValueError Traceback ```python ValueError Traceback (most recent call last) Cell In[19], line 4 1 fig, ax = plt.subplots(figsize=(15,20)) 3 iids = slice(39,40) ----> 4 si.plot_traces( 5 recording_car, 6 backend='matplotlib', 7 order_channel_by_depth=True, 8 # time_range=(25.5, 26.5), # started by 34 9 time_range=(1,2), 10 channel_ids=recording_car.channel_ids[iids], 11 mode="line", 12 ax=ax, 13 show_channel_ids=True, 14 ) 15 _ = plt.yticks(np.arange(iids.stop - iids.start), recording_car.channel_ids[iids]) File ~/datta-lab/spikeinterface/src/spikeinterface/widgets/traces.py:170, in TracesWidget.__init__(self, recording, segment_index, channel_ids, order_channel_by_depth, time_range, mode, return_scaled, cmap, show_channel_ids, events, events_color, events_alpha, color_groups, color, clim, tile_size, seconds_per_row, scale, with_colorbar, add_legend, backend, **backend_kwargs) 168 traces0 = list_traces[0] 169 mean_channel_std = np.mean(np.std(traces0, axis=0)) --> 170 max_channel_amp = np.max(np.max(np.abs(traces0), axis=0)) 171 vspacing = max_channel_amp * 1.5 173 if rec0.get_channel_groups() is None: File ~/miniconda3/envs/spikeinterface/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2810, in max(a, axis, out, keepdims, initial, where) 2692 @array_function_dispatch(_max_dispatcher) 2693 @set_module('numpy') 2694 def max(a, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue, 2695 where=np._NoValue): 2696 """ 2697 Return the maximum of an array or maximum along an axis. 2698 (...) 2808 5 2809 """ -> 2810 return _wrapreduction(a, np.maximum, 'max', axis, None, out, 2811 keepdims=keepdims, initial=initial, where=where) File ~/miniconda3/envs/spikeinterface/lib/python3.9/site-packages/numpy/core/fromnumeric.py:88, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs) 85 else: 86 return reduction(axis=axis, out=out, **passkwargs) ---> 88 return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation maximum which has no identity ```
AssertionError Traceback ```python AssertionError Traceback (most recent call last) Cell In[20], line 1 ----> 1 tmp_rec = recording_car.time_slice(1, 2) File ~/datta-lab/spikeinterface/src/spikeinterface/core/baserecording.py:727, in BaseRecording.time_slice(self, start_time, end_time) 724 start_frame = self.time_to_sample_index(start_time) if start_time else None 725 end_frame = self.time_to_sample_index(end_time) if end_time else None --> 727 return self.frame_slice(start_frame=start_frame, end_frame=end_frame) File ~/datta-lab/spikeinterface/src/spikeinterface/core/baserecording.py:702, in BaseRecording.frame_slice(self, start_frame, end_frame) 684 """ 685 Returns a new recording with sliced frames. Note that this operation is not in place. 686 (...) 697 A new recording object with only samples between start_frame and end_frame 698 """ 700 from .frameslicerecording import FrameSliceRecording --> 702 sub_recording = FrameSliceRecording(self, start_frame=start_frame, end_frame=end_frame) 703 return sub_recording File ~/datta-lab/spikeinterface/src/spikeinterface/core/frameslicerecording.py:37, in FrameSliceRecording.__init__(self, parent_recording, start_frame, end_frame) 35 start_frame = 0 36 else: ---> 37 assert 0 <= start_frame < parent_size 39 if end_frame is None: 40 end_frame = parent_size AssertionError: ```
alejoe91 commented 1 month ago

I think this should be fixed if we use time_slice in plot_traces, since it seems to handle correctly all cases. Will make a PR

alejoe91 commented 1 month ago

@jonahpearl could you test this? https://github.com/SpikeInterface/spikeinterface/pull/3393

alejoe91 commented 1 month ago

Also improved error messages in that PR

jonahpearl commented 1 month ago

@jonahpearl could you test this? #3393

Actually doesn't seem to work for me — when I subselect my one channel, I just get an empty plot, even without passing any time_range. If I don't subselect the channels it works. And you can see that channel has data b/c if i pull it out directly it's fine.

image image image