Making spike sorting recording files fails when there are very short data segments

jguides commented 2 years ago

Describe the bug Making spike sorting recording files fails when there are very short data segments. This occurs when a raw recording has jumps in timestamps that are very close together.

To Reproduce Steps to reproduce the behavior: Populate SpikeSortingRecording with a recording file with very short data segments.

Expected behavior There should be a way to spike sort files with gaps in data that are close together. A solution with minimal changes to the pipeline would be to require users to exclude very short data segments, e.g. via sort_interval. In this case, it would be helpful to the user if a recording was checked for segments that are too short and a more informative error message was thrown at this time, and/or if documentation was added so users know about this current requirement for data to not contain very short segments.

Screenshots `write_binary_recording with n_jobs 1 chunk_size 156250000

ValueError Traceback (most recent call last) Input In [10], in <cell line: 1>() 7 recording_key = {'nwb_file_name': nwb_file_name, 8 'sort_group_id': sort_group_id, 9 'sort_interval_name': sort_interval_name, 10 'preproc_params_name': parameter_set_dict["preproc_params_name"][sort_group_brain_region], 11 'interval_list_name': interval_list_name, 12 'team_name': team_name} 13 SpikeSortingRecordingSelection.insert1(recording_key, skip_duplicates=True) ---> 14 SpikeSortingRecording.populate([(SpikeSortingRecordingSelection & recording_key).proj()]) 15 if only_make_ss_recording: 16 continue

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/datajoint/autopopulate.py:229, in AutoPopulate.populate(self, suppress_errors, return_exception_objects, reserve_jobs, order, limit, max_calls, display_progress, processes, make_kwargs, *restrictions) 225 if processes == 1: 226 for key in ( 227 tqdm(keys, desc=self.class.name) if display_progress else keys 228 ): --> 229 error = self._populate1(key, jobs, **populate_kwargs) 230 if error is not None: 231 error_list.append(error)

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/datajoint/autopopulate.py:281, in AutoPopulate._populate1(self, key, jobs, suppress_errors, return_exception_objects, make_kwargs) 279 self.class._allow_insert = True 280 try: --> 281 make(dict(key), **(make_kwargs or {})) 282 except (KeyboardInterrupt, SystemExit, Exception) as error: 283 try:

File ~/Src/spyglass/src/spyglass/spikesorting/spikesorting_recording.py:319, in SpikeSortingRecording.make(self, key) 317 if os.path.exists(key['recording_path']): 318 shutil.rmtree(key['recording_path']) --> 319 recording = recording.save(folder=key['recording_path'], n_jobs=1, 320 total_memory='10G') 321 self.insert1(key)

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/spikeinterface/core/base.py:582, in BaseExtractor.save(self, kwargs) 580 loaded_extractor = self.save_to_zarr(kwargs) 581 else: --> 582 loaded_extractor = self.save_to_folder(**kwargs) 583 return loaded_extractor

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/spikeinterface/core/base.py:658, in BaseExtractor.save_to_folder(self, name, folder, dump_ext, verbose, save_kwargs) 652 provenance_file.write_text( 653 json.dumps({'warning': 'the provenace is not dumpable!!!'}), 654 encoding='utf8' 655 ) 657 # save data (done the subclass) --> 658 cached = self._save(folder=folder, verbose=verbose, save_kwargs) 660 self.save_metadata_to_folder(folder) 662 # copy properties/

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/spikeinterface/core/baserecording.py:196, in BaseRecording._save(self, format, save_kwargs) 193 dtype = self.get_dtype() 195 job_kwargs = {k: save_kwargs[k] for k in job_keys if k in save_kwargs} --> 196 write_binary_recording(self, file_paths=file_paths, dtype=dtype, job_kwargs) 198 from .binaryrecordingextractor import BinaryRecordingExtractor 199 cached = BinaryRecordingExtractor(file_paths=file_paths, sampling_frequency=self.get_sampling_frequency(), 200 num_chan=self.get_num_channels(), dtype=dtype, 201 t_starts=t_starts, channel_ids=self.get_channel_ids(), time_axis=0, 202 file_offset=0, gain_to_uV=self.get_channel_gains(), 203 offset_to_uV=self.get_channel_offsets())

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/spikeinterface/core/core_tools.py:241, in write_binary_recording(recording, file_paths, dtype, add_file_extension, verbose, byte_offset, job_kwargs) 238 init_args = (recording.to_dict(), rec_memmaps_dict, dtype) 239 executor = ChunkRecordingExecutor(recording, func, init_func, init_args, verbose=verbose, 240 job_name='write_binary_recording', job_kwargs) --> 241 executor.run()

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/spikeinterface/core/job_tools.py:253, in ChunkRecordingExecutor.run(self) 251 worker_ctx = self.init_func(*self.init_args) 252 for segment_index, frame_start, frame_stop in all_chunks: --> 253 res = self.func(segment_index, frame_start, frame_stop, worker_ctx) 254 if self.handle_returns: 255 returns.append(res)

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/spikeinterface/core/core_tools.py:178, in _write_binary_chunk(segment_index, start_frame, end_frame, worker_ctx) 175 rec_memmap = worker_ctx['rec_memmaps'][segment_index] 177 # apply function --> 178 traces = recording.get_traces(start_frame=start_frame, end_frame=end_frame, segment_index=segment_index) 179 traces = traces.astype(dtype) 180 rec_memmap[start_frame:end_frame, :] = traces

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/spikeinterface/core/baserecording.py:102, in BaseRecording.get_traces(self, segment_index, start_frame, end_frame, channel_ids, order, return_scaled) 100 channel_indices = self.ids_to_indices(channel_ids, prefer_slice=True) 101 rs = self._recording_segments[segment_index] --> 102 traces = rs.get_traces(start_frame=start_frame, end_frame=end_frame, channel_indices=channel_indices) 103 if order is not None: 104 assert order in ["C", "F"]

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/spikeinterface/toolkit/preprocessing/filter.py:117, in FilterRecordingSegment.get_traces(self, start_frame, end_frame, channel_indices) 114 traces_chunk = traces_chunk.astype("float32") 116 if self.filter_mode == 'sos': --> 117 filtered_traces = scipy.signal.sosfiltfilt(self.coeff, traces_chunk, axis=0) 118 elif self.filter_mode == 'ba': 119 b, a = self.coeff

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/scipy/signal/_signaltools.py:4335, in sosfiltfilt(sos, x, axis, padtype, padlen) 4333 ntaps = 2 * n_sections + 1 4334 ntaps -= min((sos[:, 2] == 0).sum(), (sos[:, 5] == 0).sum()) -> 4335 edge, ext = _validate_pad(padtype, padlen, x, axis, 4336 ntaps=ntaps) 4338 # These steps follow the same form as filtfilt with modifications 4339 zi = sosfilt_zi(sos) # shape (n_sections, 2) --> (n_sections, ..., 2, ...)

File ~/anaconda3/envs/spyglass/lib/python3.8/site-packages/scipy/signal/_signaltools.py:4105, in _validate_pad(padtype, padlen, x, axis, ntaps) 4103 # x's 'axis' dimension must be bigger than edge. 4104 if x.shape[axis] <= edge: -> 4105 raise ValueError("The length of the input vector x must be greater " 4106 "than padlen, which is %d." % edge) 4108 if padtype is not None and edge > 0: 4109 # Make an extension of length edge at each 4110 # end of the input array. 4111 if padtype == 'even':

ValueError: The length of the input vector x must be greater than padlen, which is 33.

`

jguides commented 2 years ago

@khl02007 @edeno @lfrank any preferences on how to approach this? Should the user be expected to mask out the small time segments prior to generating recordings for spikesorting? Or would you prefer a solution within the pipeline for generating recordings for spikesorting, that allows a user to pass in recordings with small time segments?

lfrank commented 2 years ago

@jguides Sorry for the delayed response. My suggestion would be to add a minimum segment length parameter to the spike sorting preprocessing parameters and then to have the _get_sort_interval_valid_times function of SpikeSortingRecording take than and only return intervals that are longer than the minimum. Would that work?

There is an intervals_by_length function in common_interval which should do the interval filtering for you.

jguides commented 2 years ago

@lfrank this sounds great to me. I just submitted a pull request.

lfrank commented 2 years ago

I believe this has been fixed in @jguides now-merged pull request

LorenFrankLab / spyglass

Making spike sorting recording files fails when there are very short data segments #276