flatironinstitute / neuropixels-data-sep-2020

Example neuropixels datasets for purposes of developing spike sorting algorithms
Apache License 2.0
9 stars 7 forks source link

neuropixels-data-sep-2020

Example electrophysiology recordings for the purpose of developing and optimizing spike sorting algorithms for neuropixels probes. Methods for dealing with drift are of particular interest.

This repo is in preparation. We will be adding more recordings and curated sortings over time. We are also improving the reliability of the peer-to-peer file transfer as well as adding functionality to the web GUI.

Update 9 Sep 2020: Note in the table below that one of the curated sorting results has recently been corrected and a few more datasets have been added.

Overview

This repository contains links to some ephys recordings using neuropixels probes together with curated spike sorting results. It also contains two recordings with known imposed drift. In the future, it will contain hybrid pseudo-ground truth neuropixels recordings. These may be used to evaluate the performance of spike sorting methods. We will be adding to this collection over time.

You can interact with the data in various ways:

Datasets

We are very grateful for the following individuals who have contributed data to this project:

The following recordings were generated using prepare_datasets.py (a fully reproducible hither script).

The recording/sorting exploration tool in the web links below is under active development. Over time the responsiveness will improve. Thank you for your patience.

Recording ID Web link Description
cortexlab-single-phase-3 view A "Phase3" Neuropixels electrode array was inserted into the brain of an awake, head-fixed mouse for about an hour.
cortexlab-single-phase-3.10sec view Extracted 10 seconds of data from the beginning of the recording
cortexlab-single-phase-3-ch0-7.10sec view Extracted a subset of channels and 10 seconds of data from the beginning of the recording
cortexlab-drift-dataset1 view Neuropixels 2 recording with imposed drift (dataset1).
cortexlab-drift-dataset2 view Neuropixels 2 recording with imposed drift (dataset2).
allen_mouse419112_probeE view A one hour neuropixels recording from Allen Institute
allen_mouse415148_probeE view A one hour neuropixels recording from Allen Institute
allen_mouse419112_probeE-ch0-7.10sec view Subset of channels and first 10 seconds of allen_mouse419112_probeE
allen_mouse419112_probeE-10sec view First 10 seconds of allen_mouse419112_probeE
svoboda-SC026_080619_g0_tcat_imec0 view A Phase 3B Neuropixels probe was inserted 2.9 mm into secondary motor cortex of an awake, head-fixed mouse performing a trial-based behavioural task.
svoboda-SC022_030319_g0_tcat_imec2 view A Phase 3B Neuropixels probe was inserted 4.5 mm into left hemisphere striatum of an awake, head-fixed mouse performing a trial-based behavioural task.
svoboda-SC026_080619_g0_tcat_imec2 view A Phase 3B Neuropixels probe was inserted 4.7 mm into the left hemisphere hippocampus&thalamus of an awake, head-fixed mouse performing a trial-based behavioural task.
svoboda-SC035_011020_g0_tcat_imec0 view A 2.0 4-shank Neuropixels probe was inserted 1 mm into the right hemisphere secondary motor cortex of an awake, head-fixed mouse performing a trial-based behavioural task.
svoboda-SC035_010920_g0_tcat_imec1 view A 2.0 4-shank Neuropixels probe was inserted 4.75 mm into the left hemisphere medulla of an awake, head-fixed mouse performing a trial-based behavioural task.
Sorting ID Web link Description
cortexlab-single-phase-3:curated view Curated spike sorting for cortexlab-single-phase-3
cortexlab-single-phase-3:curated_good view Curated spike sorting for cortexlab-single-phase-3 (good units only)
allen_mouse419112_probeE:curated view Curated spike sorting for allen_mouse419112_probeE
allen_mouse415148_probeE:curated view Curated spike sorting for allen_mouse415148_probeE Updated 9 Sep 2020
svoboda-SC026_080619_g0_tcat_imec0:curated view Curated spike sorting for svoboda-SC026_080619_g0_tcat_imec0
svoboda-SC022_030319_g0_tcat_imec2:curated view Curated spike sorting for svoboda-SC022_030319_g0_tcat_imec2
svoboda-SC026_080619_g0_tcat_imec2:curated view Curated spike sorting for svoboda-SC026_080619_g0_tcat_imec2
svoboda-SC035_011020_g0_tcat_imec0:curated view Curated spike sorting for svoboda-SC035_011020_g0_tcat_imec0
svoboda-SC035_010920_g0_tcat_imec1:curated view Curated spike sorting for svoboda-SC035_010920_g0_tcat_imec1
cortexlab-single-phase-3:spyking_circus view SpykingCircus applied to cortexlab-single-phase-3 (contributed by P. Yger)
allen_mouse419112_probeE:spyking_circus view SpykingCircus applied to allen_mouse419112_probeE (contributed by P. Yger)
allen_mouse415148_probeE:spyking_circus view SpykingCircus applied to allen_mouse415148_probeE (contributed by P. Yger)
cortexlab-drift-dataset1:spyking_circus view SpykingCircus applied to cortexlab-drift-dataset1 (contributed by P. Yger)
cortexlab-drift-dataset2:spyking_circus view SpykingCircus applied to cortexlab-drift-dataset2 (contributed by P. Yger)
svoboda-SC022_030319_g0_tcat_imec2:spyking_circus view SpykingCircus applied to svoboda-SC022_030319_g0_tcat_imec2 (contributed by P. Yger)
allen_mouse419112_probeE:mh-hdsort view hdsort applied to allen_mouse419112_probeE (contributed by M. Hennig)
allen_mouse419112_probeE:mh-herdingspikes view herdingspikes applied to allen_mouse419112_probeE (contributed by M. Hennig)
allen_mouse419112_probeE:mh-ironclust view ironclust applied to allen_mouse419112_probeE (contributed by M. Hennig)
allen_mouse419112_probeE:mh-kilosort2 view kilosort2 applied to allen_mouse419112_probeE (contributed by M. Hennig)
allen_mouse419112_probeE:mh-spykingcircus view spykingcircus applied to allen_mouse419112_probeE (contributed by M. Hennig)
svoboda-SC026_080619_g0_tcat_imec0:mh-hdsort view hdsort applied to svoboda-SC026_080619_g0_tcat_imec0 (contributed by M. Hennig)
svoboda-SC026_080619_g0_tcat_imec0:mh-herdingspikes view herdingspikes applied to svoboda-SC026_080619_g0_tcat_imec0 (contributed by M. Hennig)
svoboda-SC026_080619_g0_tcat_imec0:mh-ironclust view ironclust applied to svoboda-SC026_080619_g0_tcat_imec0 (contributed by M. Hennig)
svoboda-SC026_080619_g0_tcat_imec0:mh-kilosort2 view kilosort2 applied to svoboda-SC026_080619_g0_tcat_imec0 (contributed by M. Hennig)
svoboda-SC026_080619_g0_tcat_imec0:mh-tridesclous view tridesclous applied to svoboda-SC026_080619_g0_tcat_imec0 (contributed by M. Hennig)
cortexlab-single-phase-3:mh-hdsort view hdsort applied to cortexlab-single-phase-3 (contributed by M. Hennig)
cortexlab-single-phase-3:mh-herdingspikes view herdingspikes applied to cortexlab-single-phase-3 (contributed by M. Hennig)
cortexlab-single-phase-3:mh-ironclust view ironclust applied to cortexlab-single-phase-3 (contributed by M. Hennig)
cortexlab-single-phase-3:mh-kilosort2 view kilosort2 applied to cortexlab-single-phase-3 (contributed by M. Hennig)
cortexlab-single-phase-3:mh-spykingcircus view spykingcircus applied to cortexlab-single-phase-3 (contributed by M. Hennig)
cortexlab-single-phase-3:mh-tridesclous view tridesclous applied to cortexlab-single-phase-3 (contributed by M. Hennig)

Browse all recordings

Loading into Python and exporting to various formats

Because electrophysiology recordings can be large, we have created a peer-to-peer sharing system (kachery-p2p) that runs in Linux or Mac and interfaces directly to Python. By running a kachery-p2p daemon on your computer, you are participating in the network for sharing these datasets with other users of the system.

We have integrated this system with SpikeInterface which allows lazy loading of recordings into RecordingExtractor objects.

Prerequisites: Linux or MacOS

Step 1. Clone and install this repo in development mode

git clone https://github.com/flatironinstitute/neuropixels-data-sep-2020
cd neuropixels-data-sep-2020

We recommend you create a conda environment based on the environment.yml file distributed in this repo:

conda env create -f environment.yml
conda activate neuropixels-2020

Install this repo in editable (development) mode:

pip install -e .

If using conda, be sure that you always activate the conda environment prior to working with this repo.

For subsequent updates, run git pull and rerun the pip install -e .

Step 2. You must be running a kachery-p2p daemon on the flatiron3 channel.

First create a directory for storing downloaded files and set the KACHERY_STORAGE_DIR environment variable:

mkdir /desired/file/storage/location
export KACHERY_STORAGE_DIR=/desired/file/storage/location
# also add that to .bashrc or wherever you keep your env vars

Then start the daemon:

# You can replace $HOSTNAME with any other label for your node
kachery-p2p-start-daemon --config https://gist.githubusercontent.com/magland/637ad8be96f8bbf5a86ae1f409ab751c/raw/flatiron3.yaml --label $HOSTNAME

Keep this daemon running in a terminal. You may want to use tmux or a similar tool to keep this daemon running even if the terminal is closed.

For more information, see these instructions. The kachery-p2p tool has been tested on Linux and MacOS.

Step 3. Load a sorting into a SpikeInterface sorting extractor using the following example script:

#!/usr/bin/env python3

# You need to be running the kachery-p2p daemon, flatiron3 channel
import neuropixels_data_sep_2020 as nd
import spikeextractors as se

# sorting will be a se.SortingExtractor object
sorting = nd.load_sorting('cortexlab-single-phase-3:curated') # use a sorting ID from table above
unit_ids = sorting.get_unit_ids()
print(f'Num. units in sorting: {len(unit_ids)}')

# load a spike train for a particular unit
unit_id = unit_ids[3]
st = sorting.get_unit_spike_train(unit_id=unit_id)
print(f'Num. events in unit {unit_id}: {len(st)}')

# The output should be:
# Num. units in sorting: 675
# Num. events in unit 8: 6022

See also: ./scripts/load_all_sortings.py

Step 4. Load a recording into a SpikeInterface recording extractor:

# You need to be running the kachery-p2p daemon, flatiron1 channel
import neuropixels_data_sep_2020 as nd
import spikeextractors as se

# Replace this with the desired recording ID from above
recording_id = 'cortexlab-single-phase-3 (ch 0-7, 10 sec)'

# Note: if the files are not already on your machine, then you need
# to run a kachery-p2p daemon on the flatiron1 channel.
# Use download=True to download the entire recording at once
#    download=False means lazy download
recording = nd.load_recording(recording_id, download=False)

# recording is a SpikeInterface recording extractor
# so you can extract information
samplerate = recording.get_sampling_frequency()
num_frames = recording.get_num_frames()
num_channels = len(recording.get_channel_ids())
channel_locations = recording.get_channel_locations()

print(f'Num. channels: {num_channels}')
print(f'Duration: {num_frames / samplerate} sec')

# You can also extract the raw traces.
# This will only download the part of the raw file needed
traces = recording.get_traces(channel_ids=[0, 1, 2, 3], start_frame=0, end_frame=5000)
print(f'Shape of extracted traces: {traces.shape}')

# Or equivalently (using SubRecordingExtractor):
recording_sub = se.SubRecordingExtractor(
    parent_recording=recording,
    channel_ids=[0, 1, 2, 3],
    start_frame=0, end_frame=5000
)
traces2 = recording_sub.get_traces()
print(f'Shape of extracted traces: {traces2.shape}')

Once a dataset is loaded into a SpikeInterface extractor, you can interact with it directly in Python. The data will be lazy-loaded from the peer-to-peer network.

Downloading the data for use in MATLAB or other languages

If you plan to do your analysis in python we recommend you use spikeextractors as a container for passing data around as illustrated above. If not, or for other reasons, you can download the data directly to disk by editing and running scripts/download_recordings.py. In that case you may want to think about downloading a subset of the data using se.SubRecordingExtractor for testing prior to loading the entire files. Similarly, for the curated sortings, see scripts/download_all_sortings.py.