A proposal to add a PreprocessingPipeline class, which contains ordered preprocessing steps and their kwargs in a dictionary.
You can apply the class to a recording, or use the helper function create_preprocessed to make a preprocessed recording:
preprocessor_dict = {'bandpass_filter': {'freq_max': 3000}, 'common_reference': {}}
# apply using
from spikeinterface.preprocessing import PreprocessingPipeline
pipeline = PreprocessingPipeline(preprocessor_dict)
preprocessed_recording = pipeline.apply_to(recording)
# or
from spikeinterface.preprocessing import create_preprocessed
preprocessed_recording = create_preprocessed(recording, preprocessor_dict)
Also adds a function which takes in a recording.json provenance file and make a preprocessor_dict:
from spikeinterface.preprocessing import get_preprocessing_dict_from_json
my_dict = get_preprocessing_dict_from_json('/path/to/recording.json')
This allow for some cool things:
Users can pass a single dictionary to construct a preprocessed recording (as above). Hence it completes the “dictionary workflow”; since you can use dicts in sorting, run_sorter_jobs, and postprocessing in compute.
Increases portability between labs, since you can reconstruct the preprocessing steps from the recording.json file without the original recording (and worrying about paths).
Note that 3. only works for preprocessing steps that are in some sense “global” i.e. can be applied to any recording. This doesn’t apply for all preprocessing steps e.g. interpolate_bad_channels needs the bad_unit_ids which are recording dependent. However, many of these functions can be modified to be applied more globally e.g. if bad_unit_ids is None, interpolate_bad_channels could detect bad channels, then interpolate these. This would be apply-able to any recording, so is “global”.
No rush on this and I’m not 100% set on it being implemented. Important to get the names right. I read this: https://melevir.medium.com/python-functions-naming-tips-376f12549f9. I think it’s important that create_preprocessed doesn’t sound in-place, after the number of problems with set_probe. Hence I’m against something like apply_preprocessing(recording), and would rather have make, create, construct, produce or something in the function name. I also like the idea (from the article) that you don’t need to include e.g. recording in the name if recording is a required argument. Hence I like something like my_pipeline.apply_to(recording) rather than something like my_pipeline.apply_pipeline_to_recording(recording).
To do:
Tests
Add "allowed preprocessing steps" for get_preprocessing_dict_from_json
A proposal to add a
PreprocessingPipeline
class, which contains ordered preprocessing steps and their kwargs in a dictionary.You can apply the class to a
recording
, or use the helper functioncreate_preprocessed
to make a preprocessed recording:Also adds a function which takes in a
recording.json
provenance file and make apreprocessor_dict
:This allow for some cool things:
run_sorter_jobs
, and postprocessing incompute
.recording.json
file without the original recording (and worrying about paths).Note that 3. only works for preprocessing steps that are in some sense “global” i.e. can be applied to any recording. This doesn’t apply for all preprocessing steps e.g.
interpolate_bad_channels
needs thebad_unit_ids
which are recording dependent. However, many of these functions can be modified to be applied more globally e.g. ifbad_unit_ids
isNone
,interpolate_bad_channels
could detect bad channels, then interpolate these. This would be apply-able to any recording, so is “global”.No rush on this and I’m not 100% set on it being implemented. Important to get the names right. I read this: https://melevir.medium.com/python-functions-naming-tips-376f12549f9. I think it’s important that
create_preprocessed
doesn’t sound in-place, after the number of problems withset_probe
. Hence I’m against something likeapply_preprocessing(recording)
, and would rather havemake
,create
,construct
,produce
or something in the function name. I also like the idea (from the article) that you don’t need to include e.g.recording
in the name ifrecording
is a required argument. Hence I like something likemy_pipeline.apply_to(recording)
rather than something likemy_pipeline.apply_pipeline_to_recording(recording)
.To do:
get_preprocessing_dict_from_json