SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
533 stars 187 forks source link

Remnant issues from #3304; adding more type annotations #3373

Open tabedzki opened 2 months ago

tabedzki commented 2 months ago

Remnant check boxes from #3304:

After ensuring most documented functions offer return values, I think there are instances where defining input type hints can be useful as the input type definitions are sometimes implicitly used in the return type annotation. For example, if we annotate CurationSorting to have "BaseSorting" as part of the input in def __init__(self, sorting: "BaseSorting", make_graph=False, properties_policy="keep"):, then the return values of .sorting() and .current_sorting() are automatically provided instead of Any. The importance of this may not be high as these are internal functions rather than external functions.

tabedzki commented 2 months ago

@zm711 for input type annotation, we talked about how to mark the combination of list, tuple, array of floats. Would using Iterable[float] address this issue without being overly permissive?

zm711 commented 2 months ago

Sorry for the delay in responding to this! Yeah I think that would actually work unless it can also contain ints. So maybe something like Iterable[float|int] there's also a type hint for numbers in general, but I don't see it too often in the wild. So I would tend not to use that. What do you think @h-mayorquin ?

tabedzki commented 2 months ago

@zm711 According to PEP 484, anytime float is used, int is considered to be consistent with float which means you wouldn't have to do float|int

Rather than requiring that users write import numbers and then use numbers.Float etc., this PEP proposes a straightforward shortcut that is almost as effective: when an argument is annotated as having type float, an argument of type int is acceptable; similar, for an argument annotated as having type complex, arguments of type float or int are acceptable. This does not handle classes implementing the corresponding ABCs or the fractions.Fraction class, but we believe those use cases are exceedingly rare.

h-mayorquin commented 2 months ago

@zm711 According to PEP 484, anytime float is used, int is considered to be consistent with float which means you wouldn't have to do float|int

I think that's fine but see https://discuss.python.org/t/clarifying-the-float-int-complex-special-case/54018 for some problems with that.

One problem with the proposed approach is that, again, it is not narrow enough:

from typing import Iterable
from typeguard import check_type

example_set = {1, 2, 3}

# Check if the object is an Iterable of int
try:
    check_type(example_set, Iterable[int])
    print("The object is an Iterable[int]")
except TypeError as e:
    print(f"Type mismatch: {e}")

In most of our cases we want things were the order matters so I guess we want Sequence[float] or Sequence[int]

Also relevant:

https://discuss.python.org/t/removing-sequence-str-as-base-class-of-str/56907

tabedzki commented 2 months ago
  1. Am I understanding this correctly: the issue with float over float|int comes from the type checkers and not Python itself and that the type checkers may not always understand when something should be treated as an int vs a float?
  2. I think the part about using Sequence[float] instead of Interable is a meaningful improvement and that should be used.
  3. I don't understand how the conversation about sequence[str] relates to this. What am I missing? Is it that the unexpected behavior of not being explicitly defined within the __mro__ leads to unexpected results and doing just float would lead to some unexpected behavior?
h-mayorquin commented 2 months ago

Am I understanding this correctly: the issue with float over float|int comes from the type checkers and not Python itself and that the type checkers may not always understand when something should be treated as an int vs a float?

Sometimes the distinction between those two numeric types is meaningful so they should not be typed together. floats have some methods that int do not so passing int as float would be wrong. Note that this also might relevant in questions of computational efficiency. I was sharing it as a curiosity though.

I don't understand how the conversation about sequence[str] relates to this. What am I missing?

That was the previous typing that we had at some point and that's why I added here to the discussion. It was not related or relevant to the points that you were making.

tabedzki commented 2 months ago

Sometimes the distinction between those two numeric types is meaningful so they should not be typed together. floats have some methods that int do not so passing int as float would be wrong. Note that this also might relevant in questions of computational efficiency. I was sharing it as a curiosity though.

Ahhh understood. By typed together, you mean that we shouldn't just use float, correct?

That was the previous typing that we had at some point and that's why I added here to the discussion. It was not related or relevant to the points that you were making.

oooh! That was the issue Zach had been referring to before. Now things make sense.

zm711 commented 2 months ago

Ahhh understood. By typed together, you mean that we shouldn't just use float, correct?

I think this will end up being contextual. In an ideal scenario something should either be a float or an int, but shouldn't be both even though you could cast an int -> float like 1 -> 1.0. Things like ids are either ints | str. Durations would be floats even though if a user gave an int it wouldn't be a problem. So this is where we run into issue with typing. There's a lot of flexibility and no one right way. So I tend to at least just try to give the best "hint" for the user rather than try to nail the perfect type hint. That's why I'm okay with people that put np.dtype instead of np.typing.DTypeLike.

h-mayorquin commented 2 months ago

Yes, as Zach said, we should be contextual. But I think your proposal for having Sequence[something] of yours is an improvement over what we have. I can't find an argument against it.

tabedzki commented 2 months ago

Zach and Heberto, referring back to the issue of type hinting and avoiding bringing in a package just to get a class definition available for type hints, what are your thoughts on do something like what is suggested here, that is using something like

from __future__ import annotations
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from .basesorting import BaseSorting

def function(B: BaseSorting):
    pass
zm711 commented 2 months ago

I think you would have a hard time convincing Sam that that is okay. He already hates typing so creating a new typing convention might be a step too far for him.

tabedzki commented 2 months ago

I can understand that. It could be used on only the files where one would have to import the file to get the type hints since leaving it as "BaseSorting" itself does not provide IntelliSense hints. It comes down to whether we think the IntelliSense hints provide enough value to include that block.

h-mayorquin commented 2 months ago

I would be in favor of a solution that uses the typing module to differentiate between annotation context and runtime out of self-interest. It would make navigation easier for me.

But as Zach mentions I have the feeling Sam would be against it. We can try and move slowly.

Something that has kept me from moving in this direction is the whole mess with PEPs 563 and 649. The implementation has been delayed for a couple of python versions. There seems to be a renewed interest on moving this forward though: https://peps.python.org/pep-0749/

tabedzki commented 1 month ago

Since it seems that you two work closer with Sam than I do, would you be interested in broaching these typing format to him, or should we just let it lie for now? I think there were a couple instances where I had loaded in a package into the module just for typing annotations so if there is a way forward that achieves better type annotations while removing some of the overhead implemented via my last PR, I am all ears.