Open JRicardo24 opened 2 years ago
All of the PC metrics involve random subsampling of spikes to speed up the calculation.
The np.random
module is initialized with the same seed value on each run, which should ensure that the results are the same each time. But it's possible the seeding is not working as expected.
isolation_distance
in particular can be quite sensitive to the subsampled spikes, which is why we don't use it for any of our unit-level quality control. In fact, the only PC metric we've found to be generally useful is nearest_neighbors_hit_rate
. Have you found that one to vary significantly between runs?
I understand. Yes you're right, from all the PC metrics, nearest_neighbors_hit_rate
is the one that has most constant values between runs.
The default value for the number of spikes to subsample for computing PC metrics (max_spikes_for_unit) is set to 500, maybe it's the reason for some larger variations in isolation_distance
and other metrics on units with significantly more spikes? If yes, what would it be recommended to use on a dataset with units ranging from a few dozen spikes all the way up to 35k? @jsiegle
You can try increasing max_spikes_per_unit
to 2000 or higher. That will increase the computation time, but should make the values more stable.
Hello guys, is it normal that when we run the metrics module, on the exactly same dataset, that the values for isolation distance, l_ratio, d_prime and the 2 nearest_neighbours metrics change their values?
For some clusters the values are indeed pretty similar, but for others, like a cluster I have with 35k spikes, the isolation_distance varied from 361 to 556...
The biggest changes come from clusters with more spikes. Any thoughts about that? Is it normal?
Thank you @jsiegle