SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
480 stars 183 forks source link

Can't write frame sliced nwb recording to binary #457

Closed khl02007 closed 2 years ago

khl02007 commented 2 years ago

I get an error related to h5. Here is what I tried:

import spikeinterface.extractors as se

recording = se.read_nwb_recording(file_path='recording.nwb')
recording = recording.frame_slice(start_frame=0, end_frame=100)
recording = recording.save(folder='./tmp')

Error:

write_binary_recording with n_jobs 1  chunk_size None
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [1], in <module>
      3 recording = se.read_nwb_recording(file_path='/Users/kyu/Downloads/beans20190718.nwb')
      4 recording = recording.frame_slice(start_frame=0, end_frame=100)
----> 5 recording = recording.save(folder='./tmp_r')

File ~/repos/spikeinterface/spikeinterface/core/base.py:519, in BaseExtractor.save(self, **kwargs)
    517     return self.save_to_memory(**kwargs)
    518 else:
--> 519     return self.save_to_folder(**kwargs)

File ~/repos/spikeinterface/spikeinterface/core/base.py:592, in BaseExtractor.save_to_folder(self, name, folder, dump_ext, verbose, **save_kwargs)
    585     provenance_file.write_text(
    586         json.dumps({'warning': 'the provenace is not dumpable!!!'}),
    587         encoding='utf8'
    588     )
    591 # save data (done the subclass)
--> 592 cached = self._save(folder=folder, verbose=verbose, **save_kwargs)
    594 self.save_metadata_to_folder(folder)
    596 # copy properties/

File ~/repos/spikeinterface/spikeinterface/core/baserecording.py:197, in BaseRecording._save(self, format, **save_kwargs)
    194     dtype = self.get_dtype()
    196 job_kwargs = {k: save_kwargs[k] for k in self._job_keys if k in save_kwargs}
--> 197 write_binary_recording(self, file_paths=file_paths, dtype=dtype, **job_kwargs)
    199 from .binaryrecordingextractor import BinaryRecordingExtractor
    200 cached = BinaryRecordingExtractor(file_paths=file_paths, sampling_frequency=self.get_sampling_frequency(),
    201                                   num_chan=self.get_num_channels(), dtype=dtype,
    202                                   t_starts=t_starts, channel_ids=self.get_channel_ids(), time_axis=0,
    203                                   file_offset=0, gain_to_uV=self.get_channel_gains(),
    204                                   offset_to_uV=self.get_channel_offsets())

File ~/repos/spikeinterface/spikeinterface/core/core_tools.py:236, in write_binary_recording(recording, file_paths, dtype, add_file_extension, verbose, byte_offset, **job_kwargs)
    233 init_args = (recording.to_dict(), rec_memmaps_dict, dtype)
    234 executor = ChunkRecordingExecutor(recording, func, init_func, init_args, verbose=verbose,
    235                                   job_name='write_binary_recording', **job_kwargs)
--> 236 executor.run()

File ~/repos/spikeinterface/spikeinterface/core/job_tools.py:229, in ChunkRecordingExecutor.run(self)
    227 worker_ctx = self.init_func(*self.init_args)
    228 for segment_index, frame_start, frame_stop in all_chunks:
--> 229     res = self.func(segment_index, frame_start, frame_stop, worker_ctx)
    230     if self.handle_returns:
    231         returns.append(res)

File ~/repos/spikeinterface/spikeinterface/core/core_tools.py:177, in _write_binary_chunk(segment_index, start_frame, end_frame, worker_ctx)
    174 rec_memmap = worker_ctx['rec_memmaps'][segment_index]
    176 # apply function
--> 177 traces = recording.get_traces(start_frame=start_frame, end_frame=end_frame, segment_index=segment_index)
    178 traces = traces.astype(dtype)
    179 rec_memmap[start_frame:end_frame, :] = traces

File ~/repos/spikeinterface/spikeinterface/core/baserecording.py:101, in BaseRecording.get_traces(self, segment_index, start_frame, end_frame, channel_ids, order, return_scaled)
     99 channel_indices = self.ids_to_indices(channel_ids, prefer_slice=True)
    100 rs = self._recording_segments[segment_index]
--> 101 traces = rs.get_traces(start_frame=start_frame, end_frame=end_frame, channel_indices=channel_indices)
    102 if order is not None:
    103     assert order in ["C", "F"]

File ~/repos/spikeinterface/spikeinterface/core/frameslicerecording.py:74, in FrameSliceRecordingSegment.get_traces(self, start_frame, end_frame, channel_indices)
     72 parent_start = self.start_frame + start_frame
     73 parent_end = self.start_frame + end_frame
---> 74 traces = self._parent_recording_segment.get_traces(start_frame=parent_start,
     75                                                    end_frame=parent_end,
     76                                                    channel_indices=channel_indices)
     77 return traces

File ~/repos/spikeinterface/spikeinterface/extractors/nwbextractors.py:223, in NwbRecordingSegment.get_traces(self, start_frame, end_frame, channel_indices)
    220 es = get_electrical_series(self._nwbfile, self._electrical_series_name)
    222 if isinstance(channel_indices, slice):
--> 223     traces = es.data[start_frame:end_frame, channel_indices]
    224 else:
    225     # channel_indices is np.ndarray
    226     if np.array(channel_indices).size > 1 and np.any(np.diff(channel_indices) < 0):
    227         # get around h5py constraint that it does not allow datasets
    228         # to be indexed out of order

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/miniconda3/envs/nwb_datajoint/lib/python3.8/site-packages/h5py/_hl/dataset.py:489, in Dataset.__getitem__(self, args)
    478 """ Read a slice from the HDF5 dataset.
    479 
    480 Takes slices and recarray-style field names (more than one is
   (...)
    486 * Boolean "mask" array indexing
    487 """
    488 args = args if isinstance(args, tuple) else (args,)
--> 489 if is_empty_dataspace(self.id):
    490     if not (args == tuple() or args == (Ellipsis,)):
    491         raise ValueError("Empty datasets cannot be sliced")

File ~/miniconda3/envs/nwb_datajoint/lib/python3.8/site-packages/h5py/_hl/base.py:87, in is_empty_dataspace(obj)
     85 def is_empty_dataspace(obj):
     86     """ Check if an object's dataspace is empty """
---> 87     if obj.get_space().get_simple_extent_type() == h5s.NULL:
     88         return True
     89     return False

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5d.pyx:289, in h5py.h5d.DatasetID.get_space()

ValueError: Not a dataset (not a dataset)
alejoe91 commented 2 years ago

Same error if you do recordinf.get_traces()??

khl02007 commented 2 years ago

Yes it seems so. I get the same error.

alejoe91 commented 2 years ago

@khl02007 can you send me the NWB file? Can you access the electrical series data through the pynwb API outside of SI?

khl02007 commented 2 years ago

@alejoe91 The NWB file is >1TB. What would be a good way to send it to you?

I can access the electrical series from this file using pynwb. I can also do so with get_traces from the NwbRecordingExtractor. But it gives an error when I do frame_slice and then get_traces. Are you unable to reproduce this issue with a test nwb file you have?

khl02007 commented 2 years ago

Using pynwb 1.4.0 and hdmf 2.5.1

khl02007 commented 2 years ago

I found that during save, when we get to NwbRecordingSegment, the electrical series data cannot be accessed. I printed the electrical series during instantiation

e-series pynwb.ecephys.ElectricalSeries at 0x140374304276000
Fields:
  comments: sample comment
  conversion: 1.9499999837080395e-07
  data: <HDF5 dataset "data": shape (138007758, 256), type "<i2">
  description: Electrical series registered on electrode
  electrodes: electrodes <class 'hdmf.common.table.DynamicTableRegion'>
  interval: 1
  resolution: -1.0
  timestamps: <HDF5 dataset "timestamps": shape (138007758,), type "<f8">
  timestamps_unit: seconds
  unit: volts

and just before get_traces

e-series pynwb.ecephys.ElectricalSeries at 0x140374304276000
Fields:
  comments: sample comment
  conversion: 1.9499999837080395e-07
  description: Electrical series registered on electrode
  interval: 1
  resolution: -1.0
  timestamps_unit: seconds
  unit: volts 

they are in the same memory location but the data field is not present in the second one.

I also noticed that the NwbRecordingExtractor is re-instantiated upon save? Is there a reason for this?

By the way, get_traces on a FrameSliceRecording works if I use another variable; i.e. sliced_recording=recording.frame_slice instead of recording=recording.frame_slice

alejoe91 commented 2 years ago

Thanks for digging deeper. I think that the problem is due to the fact that we keep the io open until a __del__ is called. Using the same variable name deletes the object, so the nwbfile.acquisition is not accessible anymore.

@samuelgarcia maybe we should switch back to opening the file with the context in the segment too to avoid this weird behavior

samuelgarcia commented 2 years ago

I don't think so. Open+check/close the file for every chunk is a total perf killer. Lets discuss this in detail.

samuelgarcia commented 2 years ago

I think we need to move the ownership to the recordingsegment. And it should fix the stuff.