LorenFrankLab / spyglass

Neuroscience data analysis framework for reproducible research built by Loren Frank Lab at UCSF
https://lorenfranklab.github.io/spyglass/
MIT License
93 stars 42 forks source link

preprocessing, filtering+whitening confusing #143

Closed magland closed 2 years ago

magland commented 2 years ago

It appears that filtering of a SpikeSortingRecording happens at the time where the recording is extracted for the session, whereas whitening happens at the time of spike sorting. This is confusing because these are both preprocessing parameters (SpikeSortingPreprocessingParameters). If whitening is done at the time of sorting, shouldn't it be an attribute of SpikeSortingSelection? Someone may want to spike sort the SpikeSorting with and without whitening.

khl02007 commented 2 years ago

@magland one of the keys of the preprocessing params dict is called whiten and it is possible to turn off whitening by setting it to False. do you have an alternative in mind that would be less confusing? we had originally split preprocessing params to filter params and whitening params but we decided to pool them because they are both preprocessing params for spike sorting. also the only option for whitening is the random seed and we wanted to avoid having a table with just a single key.

magland commented 2 years ago

@khl02007 , I think whiten bool should be part of sorter_params, an attribute of SpikeSorterParameters

In common_spikesorting.py we see:

        preproc_params = (SpikeSortingPreprocessingParameters & key).fetch1('preproc_params')
        if preproc_params['whiten']:
            recording = st.preprocessing.whiten(recording=recording, seed=preproc_params['seed'])

        print(f'Running spike sorting on {key}...')
        sorter, sorter_params = (SpikeSorterParameters & key).fetch1('sorter','sorter_params')
        sorting = ss.run_sorter(sorter, recording,
                                output_folder=os.getenv('KACHERY_TEMP_DIR'),
                                delete_output_folder=True,
                                **sorter_params)

I think the whiten should instead be obtained from sorter_params['whiten']. But we don't want it to whiten twice! I believe the mountainsort4 sorter in spikeinterface does the whitening internally if whiten is set to True in sorter_params. So you can remove the st.preprocessing.whiten(...) step here.

khl02007 commented 2 years ago

I think the whiten should instead be obtained from sorter_params['whiten']

I see. The thing is we populate the (default) entries of SpikeSorterParameters (one for each sorter) by loading default parameters for spike sorters from spikeinterface, and I don't think that most of them have an argument for whitening. I suppose we could add a whiten param to the sorter params, but to me it is more natural to group the whiten param with filter params and separate them from other clustering / template matching related params. @lfrank what do you think?

I believe the mountainsort4 sorter in spikeinterface does the whitening internally if whiten is set to True in sorter_params. So you can remove the st.preprocessing.whiten(...) step here.

Correct me if I'm wrong but don't think the mountainsort4 repo contains any code for whitening any more (you may have cleaned it up) - ml_ms4alg may have had it. Anyway I'd prefer to do whitening ourselves and reserve run_sorter for spike sorting. That way we have complete control over the whitening, including the random seed.

magland commented 2 years ago

You are right that mountainsort4 repo does not have an option for whitening. However, the spikeinterface wrapper for mountainsort4 does have that option. See: https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/sorters/mountainsort4/mountainsort4.py

Note that the exact same spikeinterface.spiketoolkit function is used for both SI wrapper and your wrapper in common_spikesorting.py (noting that spikeinterface.toolkit.whiten = spikeinterface.toolkit.preprocessing.whiten)

Also, whitening is not relevant for the other sorters - since they do whitening (or the equivalent) internally.

So really all you need to is include whiten=True in the sorter parameters, and it will equivalent to what you have now.

lfrank commented 2 years ago

So … we need the whitening for the metrics calculations as well, and we don’t want to save the whitened data. That’s a big part of the reason we use the spikeinterface whitening as opposed to the mountainsort whitening. I see what you mean about this being confusing, but given that we need to whiten for metrics, it’s not obvious to me how to do this better.

On Mar 4, 2022, at 5:08 AM, Jeremy Magland @.**@.>> wrote:

This Message Is From an External Sender This message came from outside your organization.

You are right that mountainsort4 repo does not have an option for whitening. However, the spikeinterface wrapper for mountainsort4 does have that option. See: https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/sorters/mountainsort4/mountainsort4.pyhttps://urldefense.com/v3/__https://github.com/SpikeInterface/spikeinterface/blob/master/spikeinterface/sorters/mountainsort4/mountainsort4.py__;!!LQC6Cpwp!7OyvLJmhwjj33KaC6HkbpbiJ3J6jasB3Itfemm1WjtmzFXXDG3WCqQBLboVxNIc1WA$

Note that the exact same spikeinterface.spiketoolkit function is used for both SI wrapper and your wrapper in common_spikesorting.py (noting that spikeinterface.toolkit.whiten = spikeinterface.toolkit.preprocessing.whiten)

Also, whitening is not relevant for the other sorters - since they do whitening (or the equivalent) internally.

So really all you need to is include whiten=True in the sorter parameters, and it will equivalent to what you have now.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/LorenFrankLab/nwb_datajoint/issues/143*issuecomment-1059147597__;Iw!!LQC6Cpwp!7OyvLJmhwjj33KaC6HkbpbiJ3J6jasB3Itfemm1WjtmzFXXDG3WCqQBLboX8jrcqeQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABV4PSPEV2MTF4QWYJAHG7LU6IDN3ANCNFSM5PYCK7OQ__;!!LQC6Cpwp!7OyvLJmhwjj33KaC6HkbpbiJ3J6jasB3Itfemm1WjtmzFXXDG3WCqQBLboUW2kI2QQ$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!LQC6Cpwp!7OyvLJmhwjj33KaC6HkbpbiJ3J6jasB3Itfemm1WjtmzFXXDG3WCqQBLboWLRdB_xQ$ or Androidhttps://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!LQC6Cpwp!7OyvLJmhwjj33KaC6HkbpbiJ3J6jasB3Itfemm1WjtmzFXXDG3WCqQBLboUAOGXU9A$. You are receiving this because you were mentioned.

acomrie commented 2 years ago

A side comment - after playing with this a bit today, I agree that it is a bit tricky to have the whitening parameter in SpikeSortingPreprocessingParameters. The thing that I am running into occurs, as Jeremy pointed out, when I want to do multiple sorts of the same Recording with and without whitening. Because the whitening parameter is set at the level of SpikeSortingPreprocessingParameters, I think I basically have to generate two of each recording in SpikeSortingRecording, one for each parameter set. I think this then means the recording is essentially going to get duplicated, because the two recordings themselves should, as far as I understand, be exactly the same other than the parameter for whitening =True vs =False (despite the unique recording_id / SpikeSortingRecording entries). In addition to storing double the data that also means double the compute time to generate the recordings twice each. For context, in my current case, this is to get marks for clusterless decoding as well as to use ms4 to get single units. Because of this, I would favor setting the whitening parameter after generating a recording. Whether that should be in SpikeSorterParameters or some other approach before SpikeSortingSelection, I am not sure.

lfrank commented 2 years ago

Given this, I think we might need to have whitening be a sorting only parameter, and then add a whitening option for the metrics calculation, which, as far as I can tell, means that we need a whiten option in the Waveforms() class.

But I’m also noticing that the waveforms do not seem to be whitened for the current metrics calculation, and I think we need to fix that.

On Mar 4, 2022, at 4:39 PM, acomrie @.**@.>> wrote:

This Message Is From an External Sender This message came from outside your organization.

A side comment - after playing with this a bit today, I agree that it is a bit tricky to have the whitening parameter in SpikeSortingPreprocessingParameters. The thing that I am running into occurs, as Jeremy pointed out, when I want to do multiple sorts of the same Recording with and without whitening. Because the whitening parameter is set at the level of SpikeSortingPreprocessingParameters, I think I basically have to generate two of each recording in SpikeSortingRecording, one for each parameter set. I think this then means the recording is essentially going to get duplicated, because the two recordings themselves should, as far as I understand, be exactly the same other than the parameter for whitening =True vs =False (despite the unique recording_id / SpikeSortingRecording entries). In addition to storing double the data that also means double the compute time to generate the recordings twice each. For context, in my current case, this is to get marks for clusterless decoding as well as to use ms4 to get single units. Because of this, I would favor setting the whitening parameter after generating a recording. Whether that should be in SpikeSorterParameters or some other approach before SpikeSortingSelection, I am not sure.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/LorenFrankLab/nwb_datajoint/issues/143*issuecomment-1059627264__;Iw!!LQC6Cpwp!5sbrO2yCqGyBS1sL3O2wPJMCtANIcy_2HmFEeJiSpz77vMJVW_xeNMexAHQI4dqvtA$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABV4PSL3FHJGVK7IDHD4O3TU6KUNJANCNFSM5PYCK7OQ__;!!LQC6Cpwp!5sbrO2yCqGyBS1sL3O2wPJMCtANIcy_2HmFEeJiSpz77vMJVW_xeNMexAHT1bA7nOQ$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!LQC6Cpwp!5sbrO2yCqGyBS1sL3O2wPJMCtANIcy_2HmFEeJiSpz77vMJVW_xeNMexAHTtkqJn4A$ or Androidhttps://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!LQC6Cpwp!5sbrO2yCqGyBS1sL3O2wPJMCtANIcy_2HmFEeJiSpz77vMJVW_xeNMexAHT_NRgwXQ$. You are receiving this because you were mentioned.

shijiegu commented 3 months ago

Hi all @lfrank @khl02007 , I see that from the last comment above, whitening is not done in SpikeSortingRecording. But there is still an entry in spikeSorting.v0 in SpikeSortingPreprocessingParameters

seed = 0  # random seed for whitening

Does that mean in v0 SpikeSortingRecording's data is spike band filtered and whitened? The v0 tutorial notebook suggests that the recording is whitened and turned whitening off for mountainsort. If recording is whitened, how can I turn it off? Since the 4 parameters are just

{'frequency_min': 600, 'frequency_max': 6000, 'margin_ms': 5, 'seed': 0}

At the same time, I could not easily find in the code that there is a whitening step done to the recording. I only found band pass. If there is whitening, could you also please point me to which line of code this is done?