MATLAB-Community-Toolboxes-at-INCF / neuropixels-toolkit

(Under construction) Neuropixel Toolkit is a set of unified Matlab tools for manipulating Neuropixel and Kilosort datasets
MIT License
2 stars 3 forks source link

Getting Started #13

Closed SophiaUCSD16 closed 2 months ago

SophiaUCSD16 commented 2 months ago

Description

This PR introduces end-to-end pipeline testing with a simulated dataset located at /dataset/mearec/mearec_test_10s.h5. The current implementation encompasses the following critical steps:

  1. Preparation
  2. Loading Data and Probe Information
  3. Signal Conditioning
  4. Data Packaging for Automated Spike Sorting
  5. Automatic Spike Sorting
  6. Postprocessing of Spike Sorted Data
  7. Time Alignment with Auxiliary Data Streams
  8. Quality Assessment

Implementation

To get started, execute ./setup_mat_spike_sort.sh to establish the conda environment mat-spike-sort with the necessary supporting packages. Additionally, configure the Python environment in MATLAB using the following command:

pyversion('~/.conda/envs/mat-spike-sort/bin/python');

The end-to-end pipeline is implemented in getting_started.py, with each processing step encapsulated in function calls. These functions will be re-implemented in native MATLAB in later stages of development.

The following command will fetch the data used to test this processing pipeline on simulated dataset:

local_path = si.download_dataset(remote_path='mearec/mearec_test_10s.h5')

The getting_started.m script provides the necessary configurations to run the getting_started.py script end-to-end within the MATLAB environment.

Additional Notes

Running through getting_started.m will generate the same plot saved to ./figure and intermediate data file as running through getting_started.py or getting_started.ipynb

stevevanhooser commented 2 months ago

Hi Sophie -

Thanks, I'm starting to work through this.

setup_mat_spike_sort.sh:

This did not work for me on Mac OS. But even for Linux or Windows, some notes:

  1. The script should check for the existence of conda and either offer to install or give instructions to install; importantly, if conda is not found, it should exit. (I don't have conda and the shell script proceeded, and overwrote a lot of packages in my default environment. The conda command failed but the script didn't stop its execution.)

  2. apt-get is not used everywhere, for example, there is no Mac OS version. Maybe we need platform-specific versions of an installation shell?

  3. The script should prompt the user that a root password is going to be requested before it is requested and tell the user what is going to happen.

Thanks! These are small things but will make a difference from the user experience.

I will make a Mac setup file.

Best Steve

stevevanhooser commented 2 months ago

Also, pip install networkx[default] failed for me, it couldn't find it with [default] added.

stevevanhooser commented 2 months ago

There are some path issues in the Python code. It can't seem to find git-annex, even when I add /usr/local/bin to the Python shell path. When I run getting_started.py in the Python shell environment (without Matlab) it can find the path and runs successfully until it tries to find kilosort3:

Exception: The sorter kilosort3 is not installed.Please install it with:  

To use Kilosort3 run:

        >>> git clone https://github.com/MouseLand/Kilosort
    and provide the installation path by setting the KILOSORT3_PATH
    environment variables or using Kilosort3Sorter.set_kilosort3_path().

Is this Matlab?

SophiaUCSD16 commented 2 months ago

Hi Steve,

Thank you very much for highlighting the setup discrepancies across different platforms. Based on the setup_macos_mat_spike_sort.README you provided, I have updated setup.sh to setup.README. This update breaks down the setup process into several sections:

  1. Installing Conda
  2. Installing Python Packages
  3. Additional Installations

Once Conda is installed, step 2, which involves installing Python packages with pip install -e ., should be universally applicable across all platforms. However, the additional installation steps will remain specific to each platform.

SophiaUCSD16 commented 2 months ago

There are some path issues in the Python code. It can't seem to find git-annex, even when I add /usr/local/bin to the Python shell path. When I run getting_started.py in the Python shell environment (without Matlab) it can find the path and runs successfully until it tries to find kilosort3:

Exception: The sorter kilosort3 is not installed.Please install it with:  

To use Kilosort3 run:

        >>> git clone https://github.com/MouseLand/Kilosort
    and provide the installation path by setting the KILOSORT3_PATH
    environment variables or using Kilosort3Sorter.set_kilosort3_path().

Is this Matlab?

This is the MATLAB package Kilosort, which does the heavy lifting of spike sorting.

I've updated the setup.README to include installation instructions for Kilosort3:

# Install Kilosort3
cd /opt \
    && sudo curl -LJO https://github.com/MouseLand/Kilosort/archive/refs/tags/v3.0.2.zip \
    && sudo unzip v3.0.2.zip \
    && sudo mv /opt/Kilosort-3.0.2 /opt/Kilosort-3 \
    && sudo rm -rf /opt/v3.0.2.zip \
    && sudo matlab -nodesktop -nosplash -r "cd('/opt/Kilosort-3/CUDA'); mexGPUall; addpath(genpath('/opt/Kilosort-3')); savepath; exit;" \
    && sudo chown -R jovyan:jovyan /opt/Kilosort-3

System requirements for installing and running Kilosort-3 are:

Please let me know if you encounter any issues.

SophiaUCSD16 commented 2 months ago

To resolve the issue with git-annex installation, it needs to be installed directly on your operating system. According to the setup instructions you provided, brew install git-annex should work correctly for macOS users.

If you continue to encounter errors after restarting your session, consider manually downloading the required data to the location, where getting_started.py is located. git-annex should only be used in the following lines of code to fetch the data

if not os.path.isfile(local_path):
    print("The testing dataset is not avaliable at local_path {local_path}, triggering dataset_downloading")
    local_path = si.download_dataset(remote_path='mearec/mearec_test_10s.h5')

Data could be downloaded from the link https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/mearec

This will be a temporary solution for now, and we're planning to introduce a new protocol soon for sharing the testing data.

stevevanhooser commented 2 months ago

Hi Sophia -

Ah, right, Kilosort requires Linux/Windows and an NVIDIA GPU.

Will the final pipeline require the same? Maybe it doesn't make sense to have MacOS setup instructions if everything ultimately depends on KiloSort anyway. What do you think?

Thanks Steve

stevevanhooser commented 2 months ago

We will not worry about Mac versions for the moment because the system ultimately depends on Kilosort anyway (which requires Linux/Windows with NVIDIA GPU). So I will merge this into main!