SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
517 stars 186 forks source link

Intan Test Failing #3108

Closed zm711 closed 3 months ago

zm711 commented 3 months ago

@samuelgarcia and @alejoe91

So the issue here is that our test suite makes sure that a "recording" and its "traces" have the same dtype. Recently Heberto added in the ability to get the digital data from Intan in a user-friendly way. But the issue is that the digital data is stored as uint16 so that all digital channels can be packed in one vector, but after demultiplexing the values are just 1 and 0s (so stored as uint8). This causes the test to fail. It is better for memory to just store as uint8, but it does change the dtype between the raw digital and the final digital. Do we want to handle this at the test level or do we want to just store the final digital data as uint16 at the neo level?

cc @h-mayorquin.

Additional info for others:

Intan stores all digital channels in a single vector. This vector can be unpacked with bitwise operations that converts values to 0 and 1 (or True/False). This allows for reduced storage needs, but normally for neo the data should be stored in the same dtype at all levels.

alejoe91 commented 3 months ago

Thanks @zm711

IMO we socouod have a soft layer in the Spike interface constructor that overrides the NEO dtype. How does that sound?

h-mayorquin commented 3 months ago

Which test is failing?

zm711 commented 3 months ago

https://github.com/SpikeInterface/spikeinterface/actions/runs/9744201650

It's a dtype check.

h-mayorquin commented 3 months ago

I think this is a mistake I made at the neo level. We extract the dtype from the signal_channels header in spikinterface:

https://github.com/SpikeInterface/spikeinterface/blob/4539550f72883b3ed2339c8a73a52c6d811647f9/src/spikeinterface/extractors/neoextractors/neobaseextractor.py#L244

And I think the one in neo should match the traces. It is just that we are writing to the header the dtype of the memmap not the memmap of the signal.

what do you think @zm711 ?

zm711 commented 3 months ago

These are such small vectors would it be less prone to mistakes to just use the uint16 for both storage and final values. That way we don't start changing those features and potentially getting out of sync. For consistency's sake that might be the easiest. So the memmap and final array just keep in sync and if people want to cast using the spikeinterface machinery (or on their own) they can make that choice.

This is an edge case and since we should eventually have digital machinery in Neo it is probably best not to do bespoke solutions and instead once there is digital machinery we can have different testing machinery. But that would be my solution. Take the hit on memory footprint to protect the consistency between memmap and trace.

h-mayorquin commented 3 months ago

So you prefer to just return the traces as uint16 as well? Fine by me

zm711 commented 3 months ago

One other point. Isn't one-file-per-channel already stored as 1 and 0s as uint16? So that would be another point of confusion right?

h-mayorquin commented 3 months ago

Yes. But strictly it should not.

But I am fine leaving some performance points on the table if you think this is better. Here is a fix with what you want if I understood well:

https://github.com/NeuralEnsemble/python-neo/pull/1500

zm711 commented 3 months ago

Close tomorrow after we confirm full-tests pass?

h-mayorquin commented 3 months ago

Sure. You can also run the test manually JFYI.