SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
500 stars 187 forks source link

Improve handling of `coeff` in `filter.py` #3243

Open JoeZiminski opened 2 months ago

JoeZiminski commented 2 months ago

Just a small thing, leaving as a reminder for myself to open a PR after #3172. At present coeff in filter.py takes tuple[np.array] | list[np.array] | np.array | None. This needs reflecting in the docstring, and the numpy array are cast to list in the class constructer but I don't think its necessary, the np.array can be passed directly to the scipy functions.

In general, if an argument can take list[x] | tuple[x] what shall we do? It seems verbose to include both in docstring. Shall we always set to tuple (unless the variable is meant to be mutated in place, which I think is not likely in most cases)?

zm711 commented 2 months ago

In general, if an argument can take list[x] | tuple[x] what shall we do? It seems verbose to include both in docstring. Shall we always set to tuple (unless the variable is meant to be mutated in place, which I think is not likely in most cases)?

That was a huge debate earlier because there are three options (not including yours): 1) Sequence which would be the typing.Sequence type. This is the least verbose most accurate, but it is a bit computer-sciency and since we are catering to neuroscientists we decided not to do that to encourage explicit to help programming beginners 2) SpikeInterface_Input (we never settled on a name) which would go into one of the util files and just be (input = 'list[x]'|'tuple[x]'|etc so that we keep the explicitness, but don't have to write it everywhere. @h-mayorquin was strongly against that because it hides the types from the developer ( I think he can correct me if I'm misstating his opinion)-- and I agree with him that I don't like this 3) Just be explicit and include all possibilities. More work for us, but it lets the end-user, especially the beginner intuit what a sequence is by listing the sequences that we take in.

I think we also briefly discussed just putting list in there and assuming people would get the vibe (so your idea), but we decided against that.

h-mayorquin commented 2 months ago

The previous discussion was about ids which we allow list and array, then @DradeAW likes to use tuples and he gave some arguments to add tuple. I honestly don't remember where this was discussion took place so I can't link it. Probably slack?

Sequence which would be the typing.Sequence type. This is the least verbose most accurate, but it is a bit computer-sciency and since we are catering to neuroscientists we decided not to do that to encourage explicit to help programming beginners

Sequences does not work because a single string is of type sequence but we don't take that.

There is some recent debate among the python type developers about this. See this: https://discuss.python.org/t/removing-sequence-str-as-base-class-of-str/56907/4

h-mayorquin commented 2 months ago

I think that something from numpy typing module could work.

zm711 commented 2 months ago

Sequences does not work because a single string is of type sequence but we don't take that.

You're right. I always forget that str is a sequence. Which has led to issues in the library itself.

I think that something from numpy typing module could work.

I think NumPy typing is even more confusing for new people.

Their typing for "dtype" is DTypeLike, which is okay I guess, but I'm just not fan of.

h-mayorquin commented 2 months ago

Why do you tink that numpy is more confusing?

ArrayLike seems to encanpsulate what we are after and seems like a intuitive enough for me.

https://numpy.org/devdocs/reference/typing.html#numpy.typing.ArrayLike

What I am missing or how you are thinking about it?

zm711 commented 2 months ago

No I think it is clear for us, but I like to think of the mental burden of brand new people. ArrayLike sounds computer-y and for basic scientists coming into this just learning python and working on their scripts they will learn list in their beginning class and eventually np.array, but jumping to ArrayLike is tricky. So it depends on type hints for us or for the user and if for the user are they for the beginner (where I would prefer to reduce cognitive load) or the power-user (who will have no problem with ArrayLike). I tend to think of type hints as for the beginning user, but if we don't as group then I'm fine with numpy typing.

h-mayorquin commented 2 months ago

Don't most scientific programmers learn arrays (matlab or numpy) before learning list, tuples and the native data structures of python? I thought that arrays would be more natural to them (i.e. vector like) than an explicit enumeration of the types.

SpikeInterface_Input (we never settled on a name) which would go into one of the util files and just be (input = 'list[x]'|'tuple[x]'|etc so that we keep the explicitness, but don't have to write it everywhere. @h-mayorquin was strongly against that because it hides the types from the developer ( I think he can correct me if I'm misstating his opinion)-- and I agree with him that I don't like this

I did not answer to this. I am not sure I believe this. For programmers I think it can be useful to centralize but I did not want to centralize/generalize too early. For newbies we need to find a good name but then we come back to the problem with ArrayLike which we are discussing above.

zm711 commented 2 months ago

I am not sure I believe this.

Again I don't want to misquote you, but I roughly remember you saying you guys did this at neuroconv where you had something like ArrayLike, but then it was hard to remember what that actually incapsulated and it was annoying to have to go to the imports and go to the file where definition was stored. But I could be totally wrong and just want an ally in this fight.

Don't most scientific programmers learn arrays (matlab or numpy) before learning list, tuples and the native data structures of python?

I don't think people learn tuple at all. I would say list would still be relatively common. I was helping some people with the intro to computing in neuro class here at Harvard and it was a lot of native python moving onto numpy structures later. Part of the problem is that arrays are called vectors or matrices in Matlab so the inherent link between the array object/structure and the mathematical entity is not necessary clear when learning Matlab. Even Numpy originally had matrix stuff which was deprecated because really everything is an array. But I would opine that even scientific programmers may not make the link right away between array and matrix. But again I'm fine being overruled. I probably tend to assume too little knowledge in users.

h-mayorquin commented 2 months ago

but then it was hard to remember what that actually incapsulated and it was annoying to have to go to the imports and go to the file where definition was stored. But I could be totally wrong and just want an ally in this fight.

No worries. I think the neuroconv example is that where we encapsulated FolderPath like for str|Path and also FilePath for str | Path. In that case it is too few options for this to make sense to me and the cost of having to learn where to import this (as I needed to build functions with that signature) did not seem to be worth the cost. But in this case, there are many things (not just Path | str) and it is kind of obscure what it means so I think it makes sense.

I was helping some people with the intro to computing in neuro class here at Harvard and it was a lot of native python moving onto numpy structures later.

Fair enough, this is evidence against my belief.

So to summarize, I agree to encapsulate if there are many types (like with ArrayLike) but we really need a good name. If ArrayLike does not do it for this case and the ids I don't really anything better to offer than being explicit.

JoeZiminski commented 1 month ago

(Apologies both, will comment at some point on the above discussion!)

When taking care of these documented changes on filter.py, it would also be a good time to check the defaults as described here