Open dizcza opened 5 years ago
Hi. thanks you for catching this.
To be sure:
is the slow down is when lazy=True or lazy=False (reader.read_clock(lazy=False)
)
We pass lazy=True
but then
if lazy and not hasattr(neo.io, 'proxyobjects'):
warnings.warn('Your version of neo does not support proxyobjects. Setting lazy=False.')
lazy = False
and so it becomes False with neo v0.7.2.
OK. Ithink ther performence diefference could simply be explain by the fact that when lazy=False then the whole dataset is load and then sliced dirrectly on memory. This process can be faster in some case if the datset fit the memory for instance.
Slicing with with lazy loading is more elegant, use less memory but do not ensure fatser computation. Slincing on disk depending the IO can have a cost. If the IO have a memmap then it should be fst but in some case slicing on disk a tricky block access and concatenating that can make the limit the speed.
Could you try the slicing with lazy=False in 0.8.0 and tell me the performences ?
With lazy=False it takes 36 sec.
So the matter is not is not 0.7.2 vs 0.8.0 but lazy=False vs lazy=True. No ?
My guess is that the slicing on disk can be more performent when slices summed duration over total time duration of the segment is small. In short slicing with lazy=True is usefull for small slices with very long recordings.
I don't knwon if I am clear enought.
Why IO ?
Why IO ?
What?
You can say that lazy flag is the culprit as long as you agree that 36 sec ~ 25 sec. If you think it's fine, then it's fine. Maybe it makes sense though to add some benchmarks to your tests so that you can see the speedups and slowdowns immediately.
Please update https://neo.readthedocs.io/en/latest/releases/0.8.0.html#lazy-loading accordingly (and here https://neo.readthedocs.io/en/latest/io.html#lazy-option-and-proxy-objects is not clear about time-memory tradeoff). Now it states
This can lead to major savings in time and memory consumption
At least, make users aware that the effect of lazy=True
might speed up (if you ever encounter this) as well as slow down.
I meant with wich IO ? Sorry. I should read what I am writting.
My guess is that perf with lazy=True/false are IO dependdent.
I don't anderstand all your time perf. To clarrify, is it ?:
What is the total enderlying segment durations ? On how slices and durations are you looping over ?
lazy flag relates to my second benchmark only
which IO
We use
block = neo.io.BlackrockIO(filename=v_data_full_path, nsx_to_load=nsx_number,
nev_override=v_sorted_full_path)
that filled up with .hdf5 files for LFPs later on (and a bunch of .nev, .ns2 files)
OK. Thank you. Internanly for BlacrockIO it is a pure memmap for signals so it should be super fast. And we should have a speedup.
For spiketrain, in blacrockIO the lazy load is not efficient because all spikes are in the same file and for each slice we need to run throught the file to keep spike that belong to the wanted slice.
Maybe it could be this.
I don't have more idea.
Maybe it could be waveforms for spiketrain that were not load before an dthat are load now.
Does the hdf5 file is bigger now ?
Could you try to make the same benchmark for blacrock but only with analogsignals ? By copying only ns6 to another directory.
I'm just curious, is the first performance degradation you described (90 sec to 1279 sec for segment slicing) still bad in 0.8.0 with lazy=False?
I wonder if the huge slowdown is the cumulative overhead of calling BlackrockIO's inefficient lazy loading of spiketrains 100k+ times. I'm assuming that with lazy=False the file is read just once and slicing is performed on the block in memory, which should be fast, I think.
With the growing comments, I decided to switch discussing and fixing the issue in a session with @JuliaSprenger that will take place in two days, on Wed.
I wonder if the huge slowdown is the cumulative overhead of calling BlackrockIO's inefficient lazy loading of spiketrains 100k+ times. I'm assuming that with lazy=False the file is read just once and slicing is performed on the block in memory, which should be fast, I think.
Exactly, that is the main issue.
The most general way of tackling this that I could come up with would be the possibility to load multiple time slices of spiketrains at once. This would have to be done on the RawIO
level and would result in only one load from hard disk per spiketrain for all epoch times at once in this use case. All of the time slicing could then be performed at once on the same temporarily loaded array. This could be achieved by e.g. allowing t_start and t_stop to be 1D arrays in all the functions required for this operation.
Not sure if this would overcomplicate the loading and/or cause other issues elsewhere, though. Since the spiketrain loading is IO specific this would have to be changed in every RawIO
individually.
In short. We should change the documentation soon because the lazy load have been written with signal in mind. And I think this is efficient for most IOs.
For spiketrain and events of course having a slice the process is not efficient at all for most IOs.
So for spikes I would not use lazy load at all. Or load evrything immediatly do something like:
reader = MyIo()
seg = reader.read_segment(lazy=True)
# load all spiketrian in mem
spiketrains = [st.load() for st in seg.spiketrains]
# and keep the lazy load for Analosgisgnal...
Maybe for a next version we could have a selective lazy option
seg = reader.read_segment(lazy = dict('AnalogSignal'=True, 'SpikeTrain'=False))
I'm not sure whether we should complicate 'laziness' logic by introducing dict('AnalogSignal'=True, 'SpikeTrain'=False)
. In the future, one might want to extend it even more, leading to an overcomplicated structure. I'd suggest stick to either False or True and put the example you showed in the docs. Or even better - always ignore laziness when loading spiketrain (if it's possible).
There is still a small slowdown in loading the block with BlackRockIO:
v0.7.2
Line # Hits Time Per Hit % Time Line Contents
==============================================================
146 1 2.0 2.0 0.0 motor_file = neo.io.BlackrockIO(filename=m_data_full_path, nsx_to_load=nsx_number,
147 1 3333432.0 3333432.0 15.3 nev_override=m_sorted_full_path)
148 1 6.0 6.0 0.0 visual_file = neo.io.BlackrockIO(filename=v_data_full_path, nsx_to_load=nsx_number,
149 1 3815768.0 3815768.0 17.6 nev_override=v_sorted_full_path)
150
151 # read both blocks
152 1 6556341.0 6556341.0 30.2 motor_block = motor_file.read_block(load_waveforms=load_waveforms, lazy=lazy)
153 1 8030084.0 8030084.0 36.9 visual_block = visual_file.read_block(load_waveforms=load_waveforms, lazy=lazy)
v0.8.0
146 1 2.0 2.0 0.0 motor_file = neo.io.BlackrockIO(filename=m_data_full_path, nsx_to_load=nsx_number,
147 1 1768212.0 1768212.0 5.4 nev_override=m_sorted_full_path)
148 1 4.0 4.0 0.0 visual_file = neo.io.BlackrockIO(filename=v_data_full_path, nsx_to_load=nsx_number,
149 1 1925418.0 1925418.0 5.8 nev_override=v_sorted_full_path)
150
151 # read both blocks
152 1 12967438.0 12967438.0 39.3 motor_block = motor_file.read_block(load_waveforms=load_waveforms, lazy=lazy)
153 1 16324623.0 16324623.0 49.5 visual_block = visual_file.read_block(load_waveforms=load_waveforms, lazy=lazy)
Both with lazy=False
.
New neo version creates BlackRockIO faster but loads a block slower, leading to the summed benchmark
v0.7.2: 9.9 sec v0.8.0: 14.7 sec
This is not a big deal for a single slowdown. But if the implementation increases the loading time by a factor of 1.5 from version to version, eventually, it'll lead to exponential growth of the runtime. But I agree, this scenario is hardly possible.
This is all for these two issues. You can close the issue if you think this is not a problem.
Thank for benchmark report. This slow down is really strange with lazy=False. I need to think about it. Keep this open. It is important.
Is there any interest in repeating this now that we are on 13.1. If no response within a week I will close this since we haven't explored this for nearly 4 years.
Long story short, some analysis gets slower by x14 times (from 90 sec to 1279 sec) by upgrading neo from 0.7.2 to 0.8.0.
cut_segment_by_epoch
, that till v0.8.0 has been carried on from repo to repo and now, since it's in neo release, I decided to switch it to yourutils.py
implementation. Previously, we useddef seg_time_slice(seg, t_start=None, t_stop=None, reset_time=False, **kwargs)
that is now replaced by
segment.time_slice()
(I can share the script if needed). Here is its timing profiler bottleneck, made with https://github.com/rkern/line_profiler:else
statement.