NumpySorting pickle TypeError

florian6973 commented 1 year ago

Hi!

I am using a caching library which saves the data by serializing it using pickle. Unfortunately, it crashes when I use a NumpySorting instance, with the error message:

Traceback (most recent call last): File "pickletest.py", line 13, in elt = pickle.load(f) TypeError: from_dict() missing 1 required positional argument: 'sampling_frequency'

Here is a minimal working example which reproduces the error:

import spikeinterface as si

elt = si.NumpySorting.from_times_labels([350, 1500, 2500], [0, 1, 0], 20000)

# pickle elt
import pickle

with open("elt.pkl", "wb") as f:
    pickle.dump(elt, f)

# unpickle elt
with open("elt.pkl", "rb") as f:
    elt = pickle.load(f)  # TypeError

I am using Python 3.9 and spikeinterface 0.97.1 Edit: The issue might be related to this line https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/core/base.py#L568

Is there a workaround in the meantime to be able to cache a NumpySorting object? In fact, there is a field is_dumpable=False, so I assume I would not have this issue with a NpzExtractorSorting since data are not in memory?

Thanks!

h-mayorquin commented 1 year ago

Which caching library?

Yes, Numpy extractors are not meant to be pickled as they have a lot of content. I think we could modify them sort of easily to make this possible but I am not sure it is a good idea. They have all the contents in memory so it will be a slow process that will probably induce sub-optimal use of the library. Let's see what @alejoe91 and @samuelgarcia think about it.

Meanwhile, as far as I understand, the workaround is something like this:

sorting = si.NumpySorting.from_times_labels([350, 1500, 2500], [0, 1, 0], 20000)
serializable_sorter = sorting.save()

florian6973 commented 1 year ago

Thanks for reply and these explanations! The workaround works perfectly, thank you very much!

The caching library is flask-caching; for small recordings I was thinking it may be easier to perform operations in memory.

h-mayorquin commented 1 year ago

Are you doing something with recordings in a web app? : O Feel free to share, it is useful for us to know about the difference use cases that people have for the library.

florian6973 commented 1 year ago

Yes indeed, in the same idea as sortingview :) Unfortunately I cannot in the details but if I can contribute somehow to spikeinterface in the future I would be glad ;)

h-mayorquin commented 1 year ago

@alejoe91 @samuelgarcia

To close this issue we need to make a decision: Do we want NumpySorting to be pickable? It is trivial to do it (we need to add _kwargs) and after the improvements in https://github.com/SpikeInterface/spikeinterface/pull/1674 I feel more confident that it will be not horrible performance wise.

If we decided against then I will add a more informative error.

alejoe91 commented 1 year ago

@h-mayorquin I think that NumpySorting can be dumpable, but not NumpyRecording (because traces are always going to be large). What do you think?

h-mayorquin commented 1 year ago

Yeah, part of me wants the symmetry of both NumpyExtractors doing the same but if we can get this hurdle out of our users noses it seems like a good thing. Plus, other SortingExtractors are non-lazy as well (Phy and Mearec for example).

So, OK, I will do two things.

1) NumpySorting should be pickable. 2) NumpyRecording should throw an informative error if someone calls pickle dump on them.

samuelgarcia commented 1 year ago

Hi Ramon, about NumpySorting you can make small patch to change the flag but do nto make deep refactor on this because I will make deeper change on mem sharable numpysorting with more change quite soon

samuelgarcia commented 1 year ago

And maybe we could also allow to make numpyrecording dumpable with a warning when the size is too big.

h-mayorquin commented 1 year ago

This should be solved now that we have separated the flags for json serialization and pickling:

https://github.com/SpikeInterface/spikeinterface/pull/1775

@florian6973 let us know if a similar problem arises.

SpikeInterface / spikeinterface

NumpySorting pickle TypeError #1621