MouseLand / pykilosort

WIP: Python port of Kilosort2
https://github.com/MouseLand/Kilosort2/
GNU General Public License v2.0
50 stars 28 forks source link

General handling of run parameters #9

Open m-beau opened 4 years ago

m-beau commented 4 years ago

I think that the way parameters are currently handled can be very confusing for general users and should be re-thought.

1) Current state of things: There are currently 2 objects (cyrille's cool dictionnaries called Bunches()) which are being passed:

as well as, independently from these, 3 integers and a bumpy datatype: dat_path, dir_path, dtype and n_channels and sample_rate. This is confusing and redundant.

2) Suggested improvements:

I think that a single dictionary should be fed to the function with all the relevant stuff, called params.

I also think that a list of relevant channel maps and a script to generate one should be available within a folder of pykilosort repo.

The run file would be:


## Imports
from pathlib import Path
import numpy as np
from math import ceil

from pykilosort import add_default_handler, run, Bunch
add_default_handler(level='DEBUG')

## Set parameters
params = Bunch()
probe = Bunch()

# Set paths ONLY PARAMETERS TO EDIT SHOULD BE THESE 3 PATHS
params.dat_path = Path('path/to/.ap.bin')
params.dir_path = Path('path/to/processing_ssd')
params.chanmap_path=Path('path/to/chanMap/directory')

# Set decoding params SHOULD BE DEDUCED FROM META FILE!
params.n_channels=385
params.dtype=np.int16
params.sample_rate=3e4

# Edit and attach probe to params
probe.NchanTOT = 385 # SHOULD BE DEDUCED FROM META FILE!
# WARNING: indexing mismatch with MATLAB hence consider the -1 SHOULD BE PROPERLY FORMATTED ALREADY
probe.chanMap = np.load(Path(chanmap_path,'chanMap.npy')).squeeze().astype(np.int64)
probe.xc = np.load(Path(chanmap_path,'xc.npy')).squeeze()
probe.yc = np.load(Path(chanmap_path,'yc.npy')).squeeze()
probe.kcoords = np.load(Path(chanmap_path,'kcoords.npy')).squeeze()
params.probe = probe

# sample rate
params.fs = 30000.

# frequency for high pass filtering (150)
params.fshigh = 150.
params.fslow = None

# minimum firing rate on a "good" channel (0 to skip)
params.minfr_goodchannels = 0

# threshold on projections (like in Kilosort1, can be different for last pass like [10 4])
params.Th = [10, 3]

# how important is the amplitude penalty (like in Kilosort1, 0 means not used,
# 10 is average, 50 is a lot)
params.lam = 10

# splitting a cluster at the end requires at least this much isolation for each sub-cluster (max=1)
params.AUCsplit = 0.9

# minimum spike rate (Hz), if a cluster falls below this for too long it gets removed
params.minFR = 1. / 50

# number of samples to average over (annealed from first to second value)
params.momentum = [20, 400]

# spatial constant in um for computing residual variance of spike
params.sigmaMask = 30

# threshold crossings for pre-clustering (in PCA projection space)
params.ThPre = 8

# danger, changing these settings can lead to fatal errors
# options for determining PCs
params.spkTh = -6  # spike threshold in standard deviations (-6)
params.reorder = 1  # whether to reorder batches for drift correction.
params.nskip = 25  # how many batches to skip for determining spike PCs

# default_params.GPU = 1  # has to be 1, no CPU version yet, sorry
# default_params.Nfilt = 1024 # max number of clusters
params.nfilt_factor = 4  # max number of clusters per good channel (even temporary ones)
params.ntbuff = 64  # samples of symmetrical buffer for whitening and spike detection
# must be multiple of 32 + ntbuff. This is the batch size (try decreasing if out of memory).
params.whiteningRange = 32  # number of channels to use for whitening each channel
params.nSkipCov = 25  # compute whitening matrix from every N-th batch
params.scaleproc = 200  # int16 scaling of whitened data
params.nPCs = 3  # how many PCs to project the spikes into
# default_params.useRAM = 0  # not yet available

params.nt0 = 61
params.nup = 10
params.sig = 1
params.gain = 1

params.templateScaling = 20.0

params.loc_range = [5, 4]
params.long_range = [30, 6]

### careful with parameters below ###
divide_bat_size=2
params.NT = params.get('NT', (64//divide_bat_size) * 1024 + params.ntbuff)
params.NTbuff = params.get('NTbuff', params.NT + 4 * params.ntbuff)
params.nt0min = params.get('nt0min', ceil(20 * params.nt0 / 61))
#####################################

run(params)
rossant commented 4 years ago

It kinds of makes sense. A few remarks:

  1. I would prefer to pass the raw data using get_ephys_reader(...) directly in the run file. The metadata (n_channels, sample_rate, dtype) would be included in it.
  2. params.fs should be automatically obtained from the raw data object. Same for probe.NchanTOT

Would you be willing to prepare a PR? Best would be if only main.py could be modified so that we don't have to change all the internal functions.

alexmorley commented 4 years ago

I'm going to work on the parameterization and config objects after setting up the benchmarking w/MATLAB. I think we can make it even better than the MATLAB version by adding some validation of the parameters (e.g. of their types, some constraints on their values etc.) This could also enforce that we don't accidentally pass/get the wrong params to the CU kernels.

Finally, it'd be great to make sure we move all of the config out of code and into (for example) a yaml file. It will make it easier for people to share / edit their configurations and its nice to have the separation.

I'll post here again when I have a clearer direction but happy to chat about it before if anyone is keen.