Open cwindolf opened 1 month ago
A PR here could try to fix https://github.com/SpikeInterface/spikeinterface/issues/2515 as well
Question:
In what case do we need to call time_to_sample_index
repeteadly?
My prior is that np.searchsorted is not gona be the bottleneck for anything reasonable if we accept vectors in the corresponding functions:
So I think the main concerns is going to be memory as you were mentioning.
Can you remind me on how the time vector is used in motion interpolation?
Good point! Seems like the search is not going to be a bottleneck. Even a pretty aggressive version of your benchmark is fast:
In [11]: time_vector = np.arange(600_000_000.0) / 30_000
In [12]: search_times = np.linspace(500_000_000.0 / 30_000, 500_090_000.0 / 30_000, num=300)
In [13]: %%timeit
...: np.searchsorted(time_vector, search_times)
...:
...:
90.1 µs ± 1.08 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In motion interpolation, sample_index_to_time()
is called to determine each sample's time in seconds, and these times are used to look up the drift for each sample. time_to_sample_index()
is not used, which would be the slow one -- if that operation even were slow, which as we can see it is not. So @samuelgarcia maybe there is no performance issue with having a time_vector in interpolation?
Following some discussions with @samuelgarcia and @JoeZiminski , I wanted to open an issue to work out the details of handling time_vectors in a way that won't crash everyone's computers. I'm opening a separate issue from https://github.com/SpikeInterface/spikeinterface/issues/2627, since this will focus on implementation details and will not change the logic.
Summary of what exists:
extractor.sample_index_to_time
andextractor.time_to_sample_index
time_seconds = t_start + sample_index / sampling_frequency
, wheret_start=0
by defaulttime_seconds = time_vector[sample_index]
Summary of problems:
Going from times in seconds to sample indices with a time_vector currently means aedit: this is not really an issue, see discussion below.np.searchsorted
for each sample, which could get expensive for two reasons: time_vectors are long so individual searches can be expensive, and doing lots of them is even worse.I propose that we can still efficiently implement time vectors with two changes:
Don't do a searchsorted for every sample input intoedit: seems fine.extractor.sample_index_to_time
I think this could be addressed by a PR that doesn't affect any other code, and would allow time_vectors in the motion correction interpolation.