Closed JoeZiminski closed 6 months ago
Hi @JoeZiminski
Yes both are possible! You should be able to use the aggregate_channels
and aggregate_units
function to merge both. https://spikeinterface.readthedocs.io/en/latest/modules/core.html#manipulating-objects-slicing-aggregating
Basically, you can simply do:
recording_aggr = si.aggregate_channels(list_of_split_recordings)
sorting_aggr = si.aggregate_units(list_of_split_recordings)
# optional but convenient:
sorting_aggr.register_recording(recording_aggr)
we = si.extract_waveforms(recording_aggr, sorting_aggr, ...)
Let me know if that's what you needed!
I also think that it would be nice to have a GroupByRecording
preprocessor, so all the preprocessing by group (or property) and aggregation is done automatically. What do you think?
An example would look like this:
recording_np2_4shanks = se.read_spikeglx(...)
recording_group_by = spre.group_by(recording_np2_4shanks, property="group")
recording_cmr = spre.common_reference(recording_group_by)
We could make a mechanism (even at the base preprocessor) to split->apply->merge traces!
@samuelgarcia @h-mayorquin what's your opinion?
It could even be in core, and just have a fancy get_traces
implementation :)
Thanks @alejoe91! Completely missed that. I think your suggestion would be really nice, it would be very convenient and reduce the liklihood of people trying this themselves and getting into trouble with the lazy aspect as mentioned above. e.g. (I think this would not work?):
split_recording_dict= recording.split_by("group")
new_dict = {}
for key, rec in split_recording_dict.items()
new_dict[key] = bandpass_filter(rec)
recording = aggregate_channels(split_recording_dict)
data_not_preprocessed_by_group = recording.get_traces(...)
@JoeZiminski what you suggest would work :)
Thanks Alessio! Sorry, I also noted an error in my above example, the new_dict
should have been aggregated not the split_recording_dict
! To be a 100% sure, in the below (clearer) example, at the get_traces
call would bandpass_filter
be processed as a whole and common_reference
by group?
bandpass_recording = bandpass_filter(recording)
split_bandpass_recording_dict= bandpass_recording .split_by("group")
new_dict = {}
for key, rec in split_bandpass_recording_dict.items()
new_dict[key] = common_reference(rec)
preprocessed_recording= aggregate_channels(new_dict)
data = preprocessed_recording.get_traces()
BTW will start working on a GroupByRecording
PR
Hi @JoeZiminski
The only thing that you need to change in your snippet is that aggregate_channels
needs a list as input:
recording_list = []
for key, rec in split_bandpass_recording_dict.items()
recording_list.append(common_reference(rec))
preprocessed_recording= aggregate_channels(recording_list)
BTW will start working on a
GroupByRecording
PR
Sounds great! :)
Thanks @alejoe91 that's amazing that works! very flexible
Hey @alejoe91 looking into this a bit further I guess there are a few ways it could operate. I wonder now if it might be easier to add this as a high-level argument to get_traces() (e.g. split_by_property="xxx"
). But maybe I am missing another obvious route! The ways I thought of were:
1) A GroupByRecording
object that in it's get_traces()
method splits the recording, calls get_traces()
on the parent recording object, then merges. This might be a bit confusing, as would only operate on preprocessing steps before itself.
e.g.
bp_recording = bandpass_filter(recording)
split_preprocessed_recording = group_by_recording(bp_recording)
cr_recording = common_reference(split_preprocessed_recording)
would only split bandpass but not common reference.
2) Have BasePreprocessor
check the recording metadata for a GroupByRecording
step and if it exists, performs splitting and merging at a high-level. The downside with this is that no matter where group_by_recording
is applied, all preprocessing steps will be performed after splitting, which could be confusing.
3) Implement as an argument directly on BasePreprocessor
instead rather than through a GroupByRecording
object. This will perform all preprocessing split by property.
@alejoe91 It flew over me that you @ me above a while ago.
We could make a mechanism (even at the base preprocessor) to split->apply->merge traces!
This is an interesting design problem. My two cents:
Given that everything at the recording level is lazy would not something like this suffice? It could even be a function:
from spikeinterface.preprocessing.basepreprocessor import BasePreprocessor
from spikeinterface.core.baserecording import BaseRecording
from spikeinterface.core.channelsaggregationrecording import ChannelsAggregationRecording
class GroupByPreprocessing(ChannelsAggregationRecording):
def __init__(recording: BaseRecording, property_to_spit_by: str, preprocessing_list: list[BasePreprocessor]):
splited_recording_dict = recording.split_by(property_to_spit_by)
recordings_to_aggregate = []
for split_recording in splited_recording_dict.values():
# Pre-processing as a pipeline
preprocessed = split_recording
for preprocessor in preprocessing_list:
preprocessed = preprocessor(preprocessed)
recordings_to_aggregate.append(preprocessed)
# Aggregating the recordings
super().__init__(recording_list=recordings_to_aggregate)
The route of having a fancier get_traces
will imply that downstream analysis would have to be aware that this object has a different signature for get_traces
which I think will bring difficulties. Better to coerce the pipeline so it behaves as yet another BaseRecording
object and then it can be used everywhere else.
Thanks @h-mayorquin
I agree that designing something at base would be hard. @JoeZiminski what do you think of Ramon's proposal? We could also propagate preprocessing_params as an input argument
Thanks @h-mayorquin @alejoe91! I think this looks like a nice pattern and removes any complications from playing with base. I had a proper look through aggregate_recordings
and understand it better now. To check my understanding, the above class could be used like this (assuming preprocessing_list
and preprocessing_params
read off the recording in this case):
shift_recording= phase_shift(recording)
bp_recording= bandpass_filter(shift_recording)
grouped_recording = group_by_preprocessing(bp_recording, property_to_split_by="group") # GroupByPreprocessing
# now calling get_traces() on grouped_recording, data would be preprocessed per-group
# adding another step
reference_recording = common_reference(grouped_recording)
# in this case, called get_traces on reference_recording, phase_shift and bandpass
# would be preprocessed per group, then referencing would be preprocessed with all groups?
If what I have said above is correct, the naming / documentation just needs to make clear that every step before the group_by_recording
is applied will be grouped. My understanding is the operation will have to group the preprocessing steps either before or after the operation is applied (unless it automatically groups everything wherever it is placed in the preprocessing chain, which is confusing). Grouping the steps before it is applied feels intuitive (and makes it a nice step just to apply at the end of pre-processing to group everything).
Yep! I agree that once you group, the splitting/re-aggregating pattern will only affect the steps prior to grouping! Everything that comes after will be applied regardless of grouping.
I like the idea of automatically grabbing preprocessing steps and parameters from an existing object, but we have to be careful here, due to the flexibility of the API.
A corner and complex case could be the following:
shift_recording= phase_shift(recording)
bp_recording= bandpass_filter(shift_recording)
grouped_recording = group_by_preprocessing(bp_recording, property_to_split_by="group")
reference_recording = common_reference(grouped_recording)
# what to do here???
grouped_recording_second = group_by_preprocessing(reference_recording, property_to_split_by="group")
Good point! Maybe in this case, if group_by_preprocessing
step is already in the preprocessing list, an error can be raised? My feeling is this provides a nice convenience function to preprocess an entire preprocessing chain by group, but if people want to do very complex things, they could use the lower-level API?
Following #2299 it was determined this was not simple to implement in a robust way and so further documentation on the existing API will be added in a new PR.
Some extra docs were added in #2316 I'll close this now.
I would like to provide an option to split recordings before preprocessing so recordings can be pre-processed per shank. There are alternatives (e.g.
groups
in thecommon_reference
module) but this is not always provided (e.g. spatial filtering). The problem I am having is that after sorting by group, I had planned to aggregate the sorting results. But, the preprocessed data is still split by group, so I cannot figure out a way to handle the inputs for waveform extraction, because the sorting results can be aggregated but the preprocessed recordings are still split.One solution would be to re-merge the split, preprocessed (i.e. written to binary) recordings. Is this possible in SI? I would be happy to write a method for this, although I can see why you might not want to include this functionality as it is a little confusing. For example, a user might split the recording, perform lazy pre-processing and then merge, but when preprocessing is actually performed it is no longer split by group. Nonetheless this could be handled with documentation / warnings or other checks.