SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
527 stars 187 forks source link

Weird signal from demuxed data #2822

Closed JuanPimientoCaicedo closed 6 months ago

JuanPimientoCaicedo commented 6 months ago

Hi, I hope you are doing well.

I have been working with Spikeinterface for some time and I am noticing some weird signals coming from my data with neuropixels and preprocessed with your tools (see attached image), it looks like the are sudden voltage changes in the signal, which is very unlikely a biological phenomenon but more an artifact of the recording or an artifact of the subsequent manipulation.

This is more or less, the pipeline that I use.

I basically:

  1. demux the data with the phase_shift function.
  2. Concatenate the recordings.
  3. Create a bandpass filtered recording and apply CMR to create a clean signal stream.
  4. Interpolate motion based on the previous signal.
  5. Finally, I correct the motion in the broadband.

I then use Kilosort4, do some precuration with your tools and manually curate in phy.

I noticed that the amount of times I see this weird "waveform" has increased after doing two things:

  1. Concatenate recordings.
  2. Use Zarr compression.

I am starting to suspect that maybe I am losing data with the Zarr compression implementation. But I don't know, is the compression that you implement lossy?

Another thing is that the first time I actually saw this phenomenon was after applying the phase_shift function. Maybe the phase shift function applied to different recordings that are so long (about 4 hours), could sum up to get what I am seeing?

I also noticed a decrease in the yield I can get from the recordings, which can be caused by the duration of the recordings. However, I am wondering if maybe something that I am doing in my pipeline could be affecting that yield.

Here is the code I am using:


base_folder = Path('E:/file_location')
Region = 'imec0.ap'

Path_1 = base_folder / '1st_recording'
Path_2 = base_folder / '2nd_recording'
Path_3 = base_folder / '3rd_recording'

first_rec = si.read_spikeglx(folder_path= Path_1,stream_id=Region)
first_rec = si.phase_shift(recording = first_rec)
second_rec = si.read_spikeglx(folder_path= Path_2,stream_id=Region)
second_rec = si.phase_shift(recording = second_rec)
third_rec = si.read_spikeglx(folder_path= Path_3,stream_id=Region)
third_rec = si.phase_shift(recording = third_rec)

recording = si.concatenate_recordings([first_rec, second_rec,third_rec])
recording.annotate(is_filtered=False) # Really important step because this data is broadband

rec_samples = {'1st':first_rec.get_total_samples,
              '2nd':second_rec.get_total_samples,
              '3rd':third_rec.get_total_samples}

bad_channel_ids, channel_labels = si.detect_bad_channels(recording=recording)
rec_clean = recording.remove_channels(remove_channel_ids=bad_channel_ids)

# preprocessing: bandpass + cmr

drift_recording_raw = rec_clean
drift_recording_filt = si.filter(recording = drift_recording_raw,band = [300, 6000], btype ='bandpass',
                                 filter_order = 2,ftype = 'butter')  # forward-backward filtering
                                                                     # 0 lag filtering
drift_recording_cmr = si.common_reference(recording = drift_recording_filt, 
                                          reference = 'global', operator ='median')

motion_folder =  Path(base_folder) /'Analysis' / 'motion_correction'  
job_kwargs = dict(n_jobs=-1, chunk_duration='1s', progress_bar=True)
rec_corrected = si.correct_motion(recording= drift_recording_cmr, preset="nonrigid_accurate", folder=motion_folder,
                                  **job_kwargs)

preprocessed_trace = interpolate_motion(
                  recording=rec_clean,
                  motion=motion_info['motion'],
                  temporal_bins=motion_info['temporal_bins'],
                  spatial_bins=motion_info['spatial_bins'],
                  **motion_info['parameters']['interpolate_motion_kwargs'])

new_folder = Path('Path_of_file')

if (new_folder / "preprocessed").is_dir():
    recording_saved = si.load_extractor(new_folder / "preprocessed")
else:
    recording_saved = preprocessed_trace.save(folder=new_folder / "preprocessed",format="zarr", **job_kwargs)

image

as a summary:

I am seeing two things:

  1. more of this weird "waveforms", which I suspect are introduced in the processing steps.
  2. less yield, which can be caused by the long recording (in which case, it is a cause of the experimental preparation)

My principal suspects related to the pipeline are:

  1. The demuxing process - phase_shift function.
  2. The concatenation of the recordings - which can introduce this voltage steps in the joining points (something unlikely, because I am just concatenating 3 subsequent recordings).
  3. The zarr compression, that can be causing the loss of data.

Thank you very much for your time!

alejoe91 commented 6 months ago

Hi @JuanPimientoCaicedo

Do these weird artifact happen at the concatenation times? Can you try to first concatenate and then do the phase shift? Phase shift needs a "margin" to work, so if your data is continuous in time it's better to concatenate first.

The zarr compression is lossless :) it uses the Blosc implementation of the Z-standard codec (as default).

alejoe91 commented 6 months ago

A couple of additional comments:

  1. Why are you using the si.filter instead of the si.bandpass_filter?
  2. the correct_motion function already returns the interpolated traces, so the interpolate_recording step can be removed
JuanPimientoCaicedo commented 6 months ago

Thank you, @alejoe91 for your rapid response.

  1. The artifacts don't happen exclusively at the concatenation times, Here you have an example of a KS4 cluster with 145 artifacts, that happen kind of uniformly during the recording, I concatenated three recordings.

image

  1. The recordings are not exactly continuous, there is a gap between them of at least 1 or 2 minutes (The time it takes to transition from one task to the other in the setup), so I believed it was better to demux them before concatenating them. Does that make sense? Do you still believe it is better to concatenate and then demux them?

  2. Yeah, I can use si.bandpass_filter in this case. I thought they were equivalent, aren't they?

  3. The interpolate_recording step is necessary because I am using the drift estimation from bandpass filtered data to the broadband recording, https://github.com/SpikeInterface/spikeinterface/issues/2749. Do you think this may be the cause of the artifact?

  4. Furthermore, This broadband recording is what I use to spikesort the data, I turn off the Kilosort drift correction and let it to filter the data before the sorting starts.

alejoe91 commented 6 months ago

Since the artifact looks quite distributed on time (looking at the amplitude plot), I don't think it's related to a particular processing step...does it make sense?

JuanPimientoCaicedo commented 6 months ago

Exploring the artifacts, I find that they do not happen randomly.

image

They happen in steps of 2 seconds:

image

I am wondering if this has something to do with the SpikeGLX - inter_sample_shifts.

alejoe91 commented 6 months ago

Interesting. I think this can be due to the motion interpolation, since temporal bins are 2sec. @samuelgarcia what do you think?

JuanPimientoCaicedo commented 6 months ago

Ok, I think I figured out what is happening:

It is in fact related to the motion interpolation step and the fact that somehow the offset of the channels is not fully corrected:

Here you can see the spike triggered average of the noise cluster before the motion interpolation:

image

and this is after the motion interpolation:

image

After eliminating bad channels I still have channels with weird offsets, these offsets are eliminated with filtering, as you can see here:

Before:

image

After:

image

Given that I am interested in the broadband data, I believe that an easy solution here would be correct the offsets of those channels and see if that works. What do you think, @alejoe91? By the way, do you have a function to do this? I am taking a look at spikeinterface.preprocessing.scale(recording, gain=1.0, offset=new_offsets) and it seems it should do the work.

alejoe91 commented 6 months ago

Yes you can use the center preprocessing function (or a high pass with a very low cutoff!)

JuanPimientoCaicedo commented 6 months ago

Excellent, thank you very much. You were very helpful!

samuelgarcia commented 6 months ago

Hi. The motion interpolation should be applied on a high pass filter traces.