Closed samuelbray32 closed 8 months ago
@rly any thoughts?
The NWB ElectricalSeries
, SpatialSeries
, and TimeSeries
types allow any numeric dtype for data
, so no conversion should be done. The timestamps
field must be float64 but as far as I can tell, they are already float64 for the ephys, analog, DIO, and sample count data. Are the position timestamps in float64 before conversion? It's hard for me to trace the code.
Also, are they all numpy arrays? If the data/timestamps is a list or tuple, then HDMF calls convert_dtype
on each element which will be slow... It looks like the data/timestamps for ephys, analog, DIO, sample count, and position are all numpy arrays. It's possible the slowdown could be from the electrode table data, but those are relatively small and should not result in a slowdown that scales with data size.
One thing you could do to figure out where the bottleneck is to do the conversion for just the analog data, just the ephys data, just the position data, etc. and see which one calls convert_dtype
too many times.
Thanks for the insight, I'll work on tracing that this afternoon
Think I found the source. When combining multiple rec files, the RecFileDataChunkIterator
was extending timestamps as a list rather than concatenating into an array. Applying https://github.com/LorenFrankLab/spikegadgets_to_nwb/commit/92ae309d46c58fa9daeaaa103255fa53f4d03ac7 cuts conversion times in ~half
Nice!
Great! Could you also update this line https://github.com/LorenFrankLab/spikegadgets_to_nwb/blob/main/src/spikegadgets_to_nwb/convert_ephys.py#L149 where multiple chunks are extended into a list and then converted into a numpy array? In my toy example tests, the list approach is slower, but how much depends on the chunk sizes.
Fixed
objectmapper.py:183(convert_dtype)
from thehdmf
packageconvert_dtype
function above happen during this stallI Just wanted to check if anyone knew of a redundant conversion that might be happening here we could avoid