Nonhuman Primate Reaching Dataset spike_train

morenzoe commented 6 months ago

Hi, I would like to ask about the NHP Reaching Dataset. In my understanding, the element of spike_train array represent how many spikes happened at some point in time and a bin_width second after that. However line 243 of the code bellow makes the element of spike_train to only represent whether there was any spike or not.

https://github.com/NeuroBench/neurobench/blob/968c627d561949f68b5f6c7bcf0a75434d0fcf5c/neurobench/datasets/primate_reaching.py#L231-L243

For example in the first unit and first channel, there are spikes at index 3019 (at 3360.99321431 second) and index 3020 (at 3360.99546708 second), which both happened in the same bin_width. Therefore, the bins result from histogram has a value of 2 at index 269749, which is the index of 3360.991999974485 second in new_t. However, spike_train[0,0,269750] has a value of 1, not 2, because of line 243 of the code above. Would you please explain why? Thank you in advance!

In addition, I am just wondering why some timestamps in t and most of new_t are not divisible by 0.004. When I open the .mat file in MATLAB, sometimes the timestamps has an extra 0.000000000000001 second. Does this make a rounding error when the file is read by h5py? Would this rounding error makes some spikes to be included in the wrong bins? However the np.arrange() function seems to be the problem too, np.arange(2281.996, 4731.676, 0.004)[269749] still produces the value 3360.991999974485, which is indivisible by 0.004. Thanks again!

jasonlyik commented 6 months ago

@vinniesun Can you help out on a couple of questions on the primate reaching data processing?

For your second question about timestamp being off, I think this is normal for floating point arithmetic and shouldn't cause any issues when the delta is more than seven decimal points.

vinniesun commented 6 months ago

Sorry for the late reply. Essentially, what we are doing here is just finding if there's any spike activity within the time bin. The time array of the dataset is not the actual sampling rate of the array attached to the monkey, but the sampling rate for cursor_pos, finger_pos and and target_pos.

morenzoe commented 6 months ago

@vinniesun No worries! I was wrong mentioning about bin_width since that argument is not used at all in the code above. I understand that the variable t from the dataset is kinematic timestamps, not neural timestamps, but that is the reason why I open this issue. The sampling rate of the electrode array attached to the monkey is far higher than the sampling rate for cursor_pos etc, so we would loss some spiking events if we only "finding if there's any spike activity within the time bin", where the time bin is from the kinematic sampling rate (new_t variable), not the neural sampling rate. Is this intended? Would you please explain why? Thanks!

@jasonlyik Thank you for you answer!

vinniesun commented 5 months ago

@morenzoe This is intended. I think the easiest way to look at this is the alignment of the spike data with the label. As our labels/velocity are recorded at every 4ms, we should be aligning our spike data within that 4ms period as well.

morenzoe commented 5 months ago

@vinniesun I see, thank you for your explanation!

NeuroBench / neurobench

Nonhuman Primate Reaching Dataset spike_train #216