performance of `is_type`

beartype / plum

Multiple dispatch in Python

https://beartype.github.io/plum

MIT License

539 stars 24 forks source link

performance of `is_type` #86

Open ggoretkin-bdai opened 1 year ago

ggoretkin-bdai commented 1 year ago

In am interested in reducing the performance penalty of using plum in my application vs dispatching "manually" with an if/elif isinstance chain. Profiling (flamegraph below) reveals a majority of time spent in is_type: https://github.com/beartype/plum/blob/b38a1625c30a235d4e7a20d8a5828280fdb0fba7/plum/parametric.py#L282-L298 .

For my specific microbenchmark, is_type is only ever called with ONE value, a plum.parametric.Val[example_py_multiple_dispatch.zoo.MyType]().

Any ideas for improving the performance? Are there any correctness issues with memoizing the result of is_type? I see that there is memoization happening at lower levels via https://github.com/beartype/beartype/blob/49560b8d4d788d279a02283d8da995895440554c/beartype/_util/cache/utilcachecall.py .

wesselb commented 1 year ago

Hey @ggoretkin-bdai!

If you would like the best performance, the simplest way is to only use so-called faithful types. I'm actually thinking that Val is a faithful type, so simply setting Val.__faithful__ = True could solve this without correctness issues?!

Here's a small benchmark:

In [20]: from plum import dispatch, Val, clear_all_cache

In [21]: @dispatch
    ...: def f(x: Val[1]):
    ...:     pass
    ...:

In [22]: x = Val(1)

In [23]: %timeit f(x)  # Not faithful...
5.12 µs ± 106 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [24]: Val.__faithful__ = True

In [25]: clear_all_cache()

In [26]: %timeit f(x)  # Faithful!
554 ns ± 9.35 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Would you be able to try this on your end, to see if this works and things still run correctly?

ggoretkin-bdai commented 1 year ago

That would be great! I will give this a shot and report back.

For now, I have a few questions. typing.Literal is an immediate example of a non-faithful type, and Val seems similar to typing.Literal. But the definition of "faithful type" does not seem to give a definite answer on either Literal nor Val. I understand the definition to be:

def is_faithful(type_, for_all_x):
    for x in for_all_x:
        if isinstance(x, type_) == issubclass(type(x), type_):
            pass
        else:
            return False
    return True

In [22]: is_faithful(int, [1, 42, 1234])
Out[22]: True

In [23]: is_faithful(typing.Literal[1], [1, 42, 1234])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 is_faithful(typing.Literal[1], [1, 42, 1234])

Cell In[21], line 3, in is_faithful(type_, for_all_x)
      1 def is_faithful(type_, for_all_x):
      2     for x in for_all_x:
----> 3         if isinstance(x, type_) == issubclass(type(x), type_):
      4             pass
      5         else:

File /usr/lib/python3.10/typing.py:994, in _BaseGenericAlias.__instancecheck__(self, obj)
    993 def __instancecheck__(self, obj):
--> 994     return self.__subclasscheck__(type(obj))

File /usr/lib/python3.10/typing.py:997, in _BaseGenericAlias.__subclasscheck__(self, cls)
    996 def __subclasscheck__(self, cls):
--> 997     raise TypeError("Subscripted generics cannot be used with"
    998                     " class and instance checks")

TypeError: Subscripted generics cannot be used with class and instance checks

In [24]: is_faithful(plum.Val(1), [1, 42, 1234])
plum.parametric.Val[1]()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 is_faithful(plum.Val(1), [1, 42, 1234])

Cell In[21], line 3, in is_faithful(type_, for_all_x)
      1 def is_faithful(type_, for_all_x):
      2     for x in for_all_x:
----> 3         if isinstance(x, type_) == issubclass(type(x), type_):
      4             pass
      5         else:

TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union

wesselb commented 1 year ago

Perhaps the definition is better understood in terms of natural language.

Consider, for example, Literal[1]. Then isinstance(x, Literal[1]) if and only if x == 1. (Note that isinstance(x, Literal[1]) won't actually run, because isinstance doesn't work with generics, but let's suppose that it would.) On the other hand, for x = 1, we have type(x) == int, so issubclass(type(x), Literal[1]) == issubclass(int, Literal[1]), which is false, because int is more general than Literal[1]. (Again, note that isinstance(int, Literal[1]) won't actually run, but let's suppose that it would check whether int is a subtype of Literal[1]). Therefore, isinstance(x, Literal[1]) and issubclass(type(x), Literal[1]) are unequal for x = 1, so Literal[1] is unfaithful.

What faithful measures is whether taking type of x retains enough information to still check whether isinstance(x, Literal[1]) or not. Since type(x) == int, if we know type(x), we only know that x is an integer. However, to check whether isinstance(x, Literal[1]) or not, we need to know specifically that x is 1, not just that x is an integer! That is what goes wrong in the above paragraph.

The case of Val is different, because it uses Plum's parametric machinery under the hood. In that case, we have that type(Val(1)) == Val[1]. Hence, if x = Val(1) and we know just type(x), then we know that x is of the type Val[1], so x must have been Val(1). Therefore, Val is a faithful type. Val, however, is not a free lunch: constructing Val(1) is relatively expensive compared to a lightweight wrapper object.

ggoretkin-bdai commented 1 year ago

Thanks for the explanation, @wesselb .

I gave your suggestion a try, and I did not notice a significant performance difference. I extracted a minimal example here: https://github.com/ggoretkin-bdai/investigate-plum-performance/blob/2a9675cf340fe4f1f3adec42924cb154d453fe96/test/test_zoo.py#L8-L10

In [1]: from plum import Val

In [2]: Val.__faithful__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 Val.__faithful__

AttributeError: type object 'Val' has no attribute '__faithful__'

In [3]: %run test/test_zoo.py

In [4]: %timeit workload_multiple_dispatch()
1.27 s ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: Val.__faithful__ = True

In [6]: %timeit workload_multiple_dispatch()
1.23 s ± 4.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit workload_elif_chain()
34.1 ms ± 975 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

wesselb commented 1 year ago

Hey @ggoretkin-bdai! Ah... I think my benchmark that I posted above is incorrect. By default Val is actually assumed to be faithful. That's my bad! What appears to be going wrong is that Val is pretty expensive to construct:

In [1]: from plum import dispatch, Val, clear_all_cache

In [2]: @dispatch
   ...: def f(x: Val[1]):
   ...:     pass
   ...:

In [3]: x = Val(1)

In [4]: %timeit f(x)
571 ns ± 7.19 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [5]: %timeit f(Val(1))
38.2 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [6]: %timeit Val(1)
36.5 µs ± 539 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Currently, parametric isn't optimised for performance, so it might be possible to make constructing Val(1) faster, if that is what you would be after. Alternatively, you could use caching, in this way:

In [7]: val = {1: Val(1)}

In [8]: %timeit f(val[1])
663 ns ± 36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

ggoretkin-bdai commented 1 year ago

The ultimate use case I (currently) care about is the one in benchmark: defining convert methods. If there is a way to dispatch directly on the type via something like https://docs.julialang.org/en/v1/base/base/#Core.Type , that would feel the most natural for me, and would not require constructing a Val. However, this would be a new feature in Plum, and it might have other performance implications.

wesselb commented 1 year ago

How about something like this?

In [1]: from plum import dispatch, Val

In [2]: _cache = {}

In [3]: def convert(t, x):
   ...:     try:
   ...:         val = _cache[t]
   ...:     except KeyError:
   ...:         _cache[t] = Val(t)
   ...:         val = _cache[t]
   ...:     return _convert(val, x)
   ...:

In [4]: class A:
   ...:     def __init__(self, x):
   ...:         self.x = x
   ...:

In [5]: class B:
   ...:     def __init__(self, x):
   ...:         self.x = x
   ...:

In [6]: @dispatch
   ...: def _convert(t: Val[A], x: B):
   ...:     return A(x.x)
   ...:

In [7]: @dispatch
   ...: def _convert(t: Val[B], x: A):
   ...:     return B(x.x)
   ...:

In [8]: convert(A, B(1))
Out[8]: <__main__.A at 0x7f5cc6728b50>

In [9]: %timeit convert(A, B(1))
1.88 µs ± 65.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [10]: %timeit A(B(1).x)
592 ns ± 26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

You might be able to squeeze out a little more performance, but this looks pretty good to me already.

ggoretkin-bdai commented 1 year ago

With the caching you suggest, for the benchmark I was using, Plum is now only 7x slower, versus 50x slower: https://github.com/ggoretkin-bdai/investigate-plum-performance/commit/e07f6580d1f44499f95508f35fdb01fbdd4c9768

A couple comments

Should Plum perform this caching internally? Why not memoize is_type?
Independently, I probably should use the existing plum.convert functionality, which side-steps the issue of Val creation entirely. This improves performance to 4x penalty! https://github.com/ggoretkin-bdai/investigate-plum-performance/commit/bf27b3ad6188cab5bc86ecf7efa15a1778fe8a52

I am now trying to see how I can avoid annotating the from/to types twice:

@conversion_method(type_from=SpatialMath_SE3, type_to=BD_SE3Pose) # type: ignore[no-redef]
def convert_whatever(from_: SpatialMath_SE3) -> BD_SE3Pose:  # noqa: F811
    return BD_SE3Pose(from_.data5)

I need them annotated in the type hints, but would like to avoid annotating them in the decorator, as is achieved via: https://github.com/beartype/plum/blob/80b614f3c820f3107b0a67aed63d066e6d8f0b25/plum/function.py#L260

signature (:class:.signature.Signature, optional): Signature. If it is not given, it will be derived from f.

ggoretkin-bdai commented 1 year ago

Here is an attempt that appears to work: https://github.com/ggoretkin-bdai/investigate-plum-performance/commit/3af609f78d3cc08b3a908e3b7be10039802fe8b6

by defining

def conversion_method_from_signature(f):
    """Decorator to add a conversion method to convert an object from one
    type to another.

    Like `plum.conversion_method` but the arguments:
        type_from (type): Type to convert from.
        type_to (type): Type to convert to.
    are extracted from the type annotations
    """
    signature = plum.extract_signature(f)
    [type_from] = signature.types
    type_to = signature.return_type
    plum.promotion.add_conversion_method(type_from, type_to, f)

wesselb commented 1 year ago

@ggoretkin-bdai, to answer your questions:

Yes, ideally more caching should happen to optimise performance. I think that is_type can be safely cached, though I should think this over more carefully. If you'd want to submit a PR, then that would be more than welcome. :) Otherwise, I'll put it on the list of TODOs.
Great!! 4x is getting closer. plum.convert definitely isn't yet optimised for performance, so perhaps something can still be squeezed there.

Your conversion_method_from_signature looks great. :) That's exactly what I would've suggested.