AlexandreDecan / portion

portion, a Python library providing data structure and operations for intervals.
GNU Lesser General Public License v3.0
476 stars 34 forks source link

Add typing info #27

Open mattmezza opened 4 years ago

mattmezza commented 4 years ago

Hello and thanks for the great library!

I think many programmers would enjoy this library even more if it had typing information, especially for the exposed api.

Are you planning to add any info with regards to type hints? Would you be willing to merge any PR related to that?

Cheers 🍻

AlexandreDecan commented 4 years ago

Hello,

I tried several time to include type annotations but unfortunately, mypy (and Python's typing module) do not support f-bounded polymorphism (see https://www.python.org/dev/peps/pep-0484/#type-variables-with-an-upper-bound) making it impossible to properly define (parametric sub-)types for intervals. Since we cannot declare an interval whose bounds are of type T where T = Union[Inf,X] and X is a subtype of Comparable (Inf is already such a type), that means that the type system cannot prevent things such as P.closed('a', 1) or even P.closed(1, 2) | P.closed('a', 'b'). This is not the only issue I got when I tried to add type annotations (for instance, having bound: Union[Inf, X] with X being Comparable for bounds led mypy to complain since Inf is also Comparable, and mypy cannot infer what the type is when P.Inf is passed for bound).

If we cannot type-check this, there is unfortunately little value to have type annotations, since most of the time, we will have a (non-generic) Interval for intervals, and Comparable (protocol-based) for values.

This is related to many open issues in mypy:

TL;DR: Using Comparable as an upper-bound makes the checker complaining when using infinities, and using Union[Inf, Comparable] confuses the type-checker ;-)

Btw, as soon as mypy supports this, I'll be happy to add type-annotations (as stub files, since otherwise it is likely we will have circular imports).

I think many programmers would enjoy this library even more if it had typing information, especially for the exposed api.

Are there specific parts of the API that are somewhat confusing? I can try to improve the documentation and docstrings accordingly.

AlexandreDecan commented 4 years ago

I'm temporarily closing this issue since we are waiting for "upstream" features.

ariebovenberg commented 4 years ago

Hi @AlexandreDecan,

I have some experience working around the limitations of Python typing. In these cases you can try to replace a Union with base-class/sub-class. This works pretty decent for sum types.

Here is what I came up with for Interval and Bound. Would something like this work?

type definitions:

# range.py
import abc
from dataclasses import dataclass
from typing import Any, Generic, TypeVar

from typing_extensions import Protocol

T = TypeVar("T")

class Comparable(Protocol):
    def __ge__(self: T, other: T) -> bool:
        ...

    def __gt__(self: T, other: T) -> bool:
        ...

    def __le__(self: T, other: T) -> bool:
        ...

    def __lt__(self: T, other: T) -> bool:
        ...

Tcomp = TypeVar("Tcomp", bound=Comparable)

class Bound(Generic[Tcomp]):

    @abc.abstractmethod
    def __gt__(self: T, other: T) -> bool:
        ...

    @abc.abstractmethod
    def __lt__(self: T, other: T) -> bool:
        ...

@dataclass(frozen=True)
class Closed(Bound[Tcomp]):
    "An inclusive bound"
    value: Tcomp

    def __gt__(self, other: Bound[Tcomp]) -> bool:
        if isinstance(other, Closed):
            return self.value > other.value
        elif isinstance(other, Open):
            return self.value >= other.value
        else:
            assert isinstance(other, Unbounded)
            return False

    def __lt__(self, other: Bound[Tcomp]) -> bool:
        if isinstance(other, Closed):
            return self.value < other.value
        elif isinstance(other, Open):
            return self.value <= other.value
        else:
            assert isinstance(other, Unbounded)
            return False

@dataclass(frozen=True)
class Open(Bound[Tcomp]):
    "An exclusive bound"
    value: Tcomp

    def __gt__(self, other: Bound[Tcomp]) -> bool:
        if isinstance(other, Closed):
            return self.value > other.value
        elif isinstance(other, Open):
            return self.value >= other.value
        else:
            assert isinstance(other, Unbounded)
            return False

    def __lt__(self, other: Bound[Tcomp]) -> bool:
        if isinstance(other, Closed):
            return self.value < other.value
        elif isinstance(other, Open):
            return self.value <= other.value
        else:
            assert isinstance(other, Unbounded)
            return False

@dataclass(frozen=True)
class Unbounded(Bound[Tcomp]):
    "An infinite endpoint"
    def __gt__(self, other: Bound[Tcomp]) -> bool:
        return True

    def __lt__(self, other: Bound[Tcomp]) -> bool:
        return True

INF: Unbounded[Any] = Unbounded()

@dataclass(frozen=True)
class Interval(Generic[Tcomp]):
    start: Bound[Tcomp]
    end: Bound[Tcomp]

    @classmethod
    def closedopen(cls, start: Tcomp, end: Tcomp) -> "Interval[Tcomp]":
        return cls(Closed(start), Open(end))

    @classmethod
    def openclosed(cls, start: Tcomp, end: Tcomp) -> "Interval[Tcomp]":
        return cls(Open(start), Closed(end))

    # add other helper constructors below... 
    # closed-closed, closed-unbounded, etc.

# an example operation
def hull(a: Interval[Tcomp], b: Interval[Tcomp]) -> Interval[Tcomp]:
    "take the extreme bounds from each interval"
    return Interval(min(a.start, b.start), max(a.end, b.end))

example usage:

# example.py
from datetime import datetime

from range import Interval, Bound, Closed, Open, INF, hull

# using one of the simplified constructors
r1: Interval[int] = Interval.closedopen(4, 6)

r1_start: Bound[int] = r1.start
assert r1_start == Closed(4)
r1_end: Bound[int] = r1.end
assert r1_end == Open(6)

# using explicit constructor
r2: Interval[datetime] = Interval(Open(datetime(2020, 1, 1)), INF)

r2_start: Bound[datetime] = r2.start
assert r2_start == Open(datetime(2020, 1, 1))
r2_end: Bound[datetime] = r2.end
assert r2_end == INF

assert r2_end > r2_start

# some valid, typechecked operations
assert hull(r1, Interval(INF, Open(5))) == Interval(INF, Open(6))
assert hull(r1, Interval.openclosed(4, 6)) == Interval(Closed(4), Closed(6))
assert hull(r1, Interval(INF, INF)) == Interval(INF, INF)

# statements below are prevented by mypy
# r1_not_ok = Interval.closedopen(5, datetime(2020, 1, 1))  # mixed types
# r2_not_ok = Interval.closedopen({}, {'foo': 'bla'})  # non-comparable types
# r2_start < r1_start  # different bound types
# hull(r1, r2)  # different interval types
AlexandreDecan commented 4 years ago

Hello @ariebovenberg

Thanks for this trick to get around the limitations of mypy/typing! I hadn't considered this approach, to be honest ;-)

I haven't yet looked in details at the code you're proposing (by the way, thanks also for this example, which already seems quite complete ;-)). I'm going to read it carefully (and play with it a bit) and I'll come back here :-))

AlexandreDecan commented 4 years ago

I looked at this example, and I'm unsure this can be applied to portion "as-is" since it requires bounds to be wrapped. Currently, each (atomic) interval corresponds to exactly 4 values (left, lower, upper, right) where left and right are either OPEN or CLOSED (constants) and lowerand upper are exactly the values provided by the user. Wrapping them on-the-fly will slightly decrease performances and add a bit of complexity to the code (and I would like to avoid as much as possible making changes to the code base "just to support" type hinting).

However, since I got several requests that could be addressed by wrapping/unwrapping bounds on-the-fly (see e.g. #30), I guess it could be interesting to start thinking about a uniform way to address all of theses (e.g. having a "generic interval" class that supports any comparable objects, but also an easy factory to create specialized intervals for a given type of comparable objects, allowing the users to benefit from type checking and to fine-tune comparisons/equalities to fit their needs).

ariebovenberg commented 4 years ago

@AlexandreDecan cool! Let me know if there is anything I can help with, or if you'd rather tackle it yourself.

AlexandreDecan commented 4 years ago

ATM, if you see any drop-in solution to support type hinting (through typesheds), you can let me know ;-) By "solution", I mean something that could prevent [1, 'a'] or [1, 2] | ['a', 'b'] for instance, without having to change the internal structure of portion (or at least, with no change in the public API, and as little changes as possible in the code base ;-))

For the wrap/unwrap of bounds on-the-fly, I need to think a bit more about many possible use-cases before deciding whether I'll do this or not. Currently, I'm considering changing some of portion internals to ease subclassing the existing Interval class. For example, instead of creating new intervals by calling Interval() from the various methods that need to create new intervals (e.g. union, intersection, etc.), it would be based on self.__class__() allowing subclasses to automatically and implicitly create objects of the same class without having to override the many methods offered by Interval. Bounds can then be wrapped in the from_atomic method (used by all indirect constructors) and unwrapped in the lower and upper properties.

AlexandreDecan commented 4 years ago

Btw, by "drop-in solution" I didn't mean a full-fledge solutions, but mainly "ideas" to support type hinting without having to change the whole code base ;-)