AllenNeuralDynamics / aind-ephys-spikesort-kilosort25-full

CodeOcean capsule for full electrophysiology analysis pipeline using Kilosort2.5 via SpikeInterface.
MIT License
11 stars 3 forks source link

Channels near tip of probe often classified as "out of brain" #17

Open egmcbride opened 7 months ago

egmcbride commented 7 months ago

We've found that for most of our data sorted with this capsule, some channels at the tip of almost every probe are classified as "out of brain" and excluded from further processing. Surprisingly, the channels that we know are outside of the brain (at the top of the probe) are rarely classified as "out of brain".

This might be caused by a couple of different issues I found in the capsule and in the spikeinterface code itself.

1) in Spikeinterface, prepocessing/detect_badchannels.py, this is the description of how "out of brain" channels are determined: * Out of brain channels are contigious regions of channels dissimilar to the median of all channels at the top end of the probe (i.e. large channel number)_

However, in detect_bad_channelsibl(), which does the bulk of the bad channel detection: # the channels outside of the brains are the contiguous channels below the threshold on the trend coherency

the chanels outide need to be at either extremes of the probe_

It would make sense to only allow channels with higher numbers to be classified as "out of brain", otherwise this is confusing and potentially preventing us from detecting some units.

2) in the capsule, bad channel detection occurs before common average referencing.

Since we are tip referencing our probes, if there is a strong signal being detected by the tip, the channels near the tip are relatively quiet, while the channels farther away, including those we know are outside of the brain, have a strong signal.

Before detecting channels that are outside of the brain, we need to correct for tip referencing, whether this is through common average reference or some other method. Otherwise, the wrong channels may be classified as out of brain.

I understand you probably want to avoid using bad channels in the CAR. Maybe CAR needs to happen twice - once over all channels to find the correct "out of brain" and noise channels, and a second time after excluding those channels.

alejoe91 commented 7 months ago

Hi Ethan! Thanks for reporting this.

  1. I agree that in case of the tip reference the bottom electrodes could look quite silent, so it makes total sense to add a "top" | "bottom" | "both" option to the detect_bad_channels. Opened an issue on SI here: https://github.com/SpikeInterface/spikeinterface/issues/2190

  2. We could add this alternative processing as an option. It might make sense to use this double CMR as default. Let's discuss this over a call!

alejoe91 commented 6 months ago

@egmcbride

The SpikeInterface side is merged (https://github.com/SpikeInterface/spikeinterface/issues/2190), but for this repo we prefer to pin to released versions.

With #19 you can now use the --no-remove-out-channels to skip out channels removal as a workaround.

@jtyoung84 how can one pass arguments to a capsule via the service?

jtyoung84 commented 6 months ago

@alejoe91 Probably the easiest way is to put the processing configs in a file (yaml, json, ini, etc) in a consistent place and have the capsule search for that file to override defaults. That way, it should also be relatively straightforward to know which parameters were used during the processing. For clarity, somebody would put the spike sorting processing parameters in a file together with the raw data before uploading it to s3.

bjhardcastle commented 6 months ago

That would be neat, but if we want to change parameters to re-run processing, we'll need to be able to modify a file that lives in the raw data bucket (not that I have a problem with that).

Doesn't this seem like something the CodeOcean API should support anyway? https://docs.codeocean.com/user-guide/code-ocean-api/capsule-api#run-capsule-pipeline-with-data-assets

jtyoung84 commented 6 months ago

I can add a feature request to allow a user to define a capusle/pipeline id and an associated set of parameters in the transfer service.

jtyoung84 commented 6 months ago

@dyf Any thoughts?

dyf commented 6 months ago

I think we should have a spot in the raw data (say, in the ecephys folder) that describes intended processing parameters. This would be immutable - it only describe the initial intention.

If you want to re-run the pipeline with different settings, just hop into the UI, tweak the parameters in the app panel, and re-run.

bjhardcastle commented 6 months ago

If you want to re-run the pipeline with different settings, just hop into the UI, tweak the parameters in the app panel, and re-run.

I'm not too keen on this for when we need to re-run on 100+ sessions!

We've been wanting to send parameters via the API to other capsules too.. Wouldn't it be a better solution in the long run if CodeOcean could add it?

jtyoung84 commented 5 months ago

Code Ocean has a REST API that allows someone to run a capsule with different input parameters. We also have a repository, aind-codeocean-api which is a thin python wrapper around their REST API.

bjhardcastle commented 5 months ago

Code Ocean has a REST API that allows someone to run a capsule with different input parameters.

Is that an undocumented feature that you know works? There are parameters listed as data_assets here, but the document type is string, and the examples show key-values for a single asset, not a list of assets: https://docs.codeocean.com/user-guide/code-ocean-api/capsule-api


We also have a repository, aind-codeocean-api which is a thin python wrapper around their REST API.

We're using it all the time - thanks!

jtyoung84 commented 5 months ago

Code Ocean has some swagger documentation that I found a little more useful, although it sill lacks examples: https://docs.codeocean.com/user-guide/v/v2.15.0/code-ocean-api/swagger-documentation

I haven't tested the api fully. I've been mostly serializing capsule parameters into a json string using parameters_to_send = [json.dumps(params)] and then de-serializing them in the capsule code as something like parameters_to_use_in_capsule = json.loads(params[0]) But I haven't tested passing in lists of data assets and lists of parameters and whether they correspond to each other. I usually just send one request per (data_asset, parameter_set) pair.

alejoe91 commented 5 months ago

I have tried with lists of parameters or strings (with multiple ARGS) and both options work :)

Will post a working script tomorrow

bjhardcastle commented 5 months ago

That's very cool - thanks both!