Closed rcpeene closed 3 months ago
The real bottleneck appears to be in DynamicTable.add_row()
Hi @rcpeene,
One way to speed up the add_interval
operation would be to add the argument check_ragged=False
. We recently added this check to provide a better warning for ragged arrays, but this operation can cause performance issues for larger tables since it checks the data on each call to add_row
/ add_interval
.
presentation_interval.add_interval(
**row,
start_time=start_time, stop_time=end_time,
tags="stimulus_time_interval", timeseries=ts, check_ragged=False
)
Could you try setting check_ragged
to False
and see if that improves your performance?
This was remarkably faster and completed in a few minutes. Thanks!
What happened?
I am trying to generate an NWB file with a rather large stim table row by row using
TimeIntervals.add_interval()
. The stim table for our experiment happens to be very large (>40,000 rows). Trying on two different machines, this takes more than 10 hours to do. The add_interval operation seems to be the bottleneck, and it takes greater amounts of time as the table gets longer.After digging through the code it looks like it might be
__calculate_idx_count
, perhaps bisect.Is there a more direct way to generate a TimeIntervals table from an existing table (while ensuring that types of each columns are properly casted)? Or is there a fix to the slowness of the add_interval operation?
Steps to Reproduce
Traceback
Operating System
Windows
Python Executable
Conda
Python Version
3.10
Package Versions
pynwb==2.8.1
Code of Conduct