Open ggoretkin-bdai opened 1 year ago
Hey @ggoretkin-bdai!
If you would like the best performance, the simplest way is to only use so-called faithful types. I'm actually thinking that Val
is a faithful type, so simply setting Val.__faithful__ = True
could solve this without correctness issues?!
Here's a small benchmark:
In [20]: from plum import dispatch, Val, clear_all_cache
In [21]: @dispatch
...: def f(x: Val[1]):
...: pass
...:
In [22]: x = Val(1)
In [23]: %timeit f(x) # Not faithful...
5.12 µs ± 106 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [24]: Val.__faithful__ = True
In [25]: clear_all_cache()
In [26]: %timeit f(x) # Faithful!
554 ns ± 9.35 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Would you be able to try this on your end, to see if this works and things still run correctly?
That would be great! I will give this a shot and report back.
For now, I have a few questions. typing.Literal
is an immediate example of a non-faithful type, and Val
seems similar to typing.Literal
. But the definition of "faithful type" does not seem to give a definite answer on either Literal
nor Val
. I understand the definition to be:
def is_faithful(type_, for_all_x):
for x in for_all_x:
if isinstance(x, type_) == issubclass(type(x), type_):
pass
else:
return False
return True
In [22]: is_faithful(int, [1, 42, 1234])
Out[22]: True
In [23]: is_faithful(typing.Literal[1], [1, 42, 1234])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[23], line 1
----> 1 is_faithful(typing.Literal[1], [1, 42, 1234])
Cell In[21], line 3, in is_faithful(type_, for_all_x)
1 def is_faithful(type_, for_all_x):
2 for x in for_all_x:
----> 3 if isinstance(x, type_) == issubclass(type(x), type_):
4 pass
5 else:
File /usr/lib/python3.10/typing.py:994, in _BaseGenericAlias.__instancecheck__(self, obj)
993 def __instancecheck__(self, obj):
--> 994 return self.__subclasscheck__(type(obj))
File /usr/lib/python3.10/typing.py:997, in _BaseGenericAlias.__subclasscheck__(self, cls)
996 def __subclasscheck__(self, cls):
--> 997 raise TypeError("Subscripted generics cannot be used with"
998 " class and instance checks")
TypeError: Subscripted generics cannot be used with class and instance checks
In [24]: is_faithful(plum.Val(1), [1, 42, 1234])
plum.parametric.Val[1]()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[24], line 1
----> 1 is_faithful(plum.Val(1), [1, 42, 1234])
Cell In[21], line 3, in is_faithful(type_, for_all_x)
1 def is_faithful(type_, for_all_x):
2 for x in for_all_x:
----> 3 if isinstance(x, type_) == issubclass(type(x), type_):
4 pass
5 else:
TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union
Perhaps the definition is better understood in terms of natural language.
Consider, for example, Literal[1]
. Then isinstance(x, Literal[1])
if and only if x == 1
. (Note that isinstance(x, Literal[1])
won't actually run, because isinstance
doesn't work with generics, but let's suppose that it would.) On the other hand, for x = 1
, we have type(x) == int
, so issubclass(type(x), Literal[1]) == issubclass(int, Literal[1])
, which is false, because int
is more general than Literal[1]
. (Again, note that isinstance(int, Literal[1])
won't actually run, but let's suppose that it would check whether int
is a subtype of Literal[1]
). Therefore, isinstance(x, Literal[1])
and issubclass(type(x), Literal[1])
are unequal for x = 1
, so Literal[1]
is unfaithful.
What faithful measures is whether taking type
of x
retains enough information to still check whether isinstance(x, Literal[1])
or not. Since type(x) == int
, if we know type(x)
, we only know that x
is an integer. However, to check whether isinstance(x, Literal[1])
or not, we need to know specifically that x
is 1
, not just that x
is an integer! That is what goes wrong in the above paragraph.
The case of Val
is different, because it uses Plum's parametric
machinery under the hood. In that case, we have that type(Val(1)) == Val[1]
. Hence, if x = Val(1)
and we know just type(x)
, then we know that x
is of the type Val[1]
, so x
must have been Val(1)
. Therefore, Val
is a faithful type. Val
, however, is not a free lunch: constructing Val(1)
is relatively expensive compared to a lightweight wrapper object.
Thanks for the explanation, @wesselb .
I gave your suggestion a try, and I did not notice a significant performance difference. I extracted a minimal example here: https://github.com/ggoretkin-bdai/investigate-plum-performance/blob/2a9675cf340fe4f1f3adec42924cb154d453fe96/test/test_zoo.py#L8-L10
In [1]: from plum import Val
In [2]: Val.__faithful__
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 Val.__faithful__
AttributeError: type object 'Val' has no attribute '__faithful__'
In [3]: %run test/test_zoo.py
In [4]: %timeit workload_multiple_dispatch()
1.27 s ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [5]: Val.__faithful__ = True
In [6]: %timeit workload_multiple_dispatch()
1.23 s ± 4.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit workload_elif_chain()
34.1 ms ± 975 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Hey @ggoretkin-bdai! Ah... I think my benchmark that I posted above is incorrect. By default Val
is actually assumed to be faithful. That's my bad! What appears to be going wrong is that Val
is pretty expensive to construct:
In [1]: from plum import dispatch, Val, clear_all_cache
In [2]: @dispatch
...: def f(x: Val[1]):
...: pass
...:
In [3]: x = Val(1)
In [4]: %timeit f(x)
571 ns ± 7.19 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [5]: %timeit f(Val(1))
38.2 µs ± 411 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [6]: %timeit Val(1)
36.5 µs ± 539 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Currently, parametric
isn't optimised for performance, so it might be possible to make constructing Val(1)
faster, if that is what you would be after. Alternatively, you could use caching, in this way:
In [7]: val = {1: Val(1)}
In [8]: %timeit f(val[1])
663 ns ± 36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
The ultimate use case I (currently) care about is the one in benchmark: defining convert
methods. If there is a way to dispatch directly on the type via something like https://docs.julialang.org/en/v1/base/base/#Core.Type , that would feel the most natural for me, and would not require constructing a Val
. However, this would be a new feature in Plum, and it might have other performance implications.
How about something like this?
In [1]: from plum import dispatch, Val
In [2]: _cache = {}
In [3]: def convert(t, x):
...: try:
...: val = _cache[t]
...: except KeyError:
...: _cache[t] = Val(t)
...: val = _cache[t]
...: return _convert(val, x)
...:
In [4]: class A:
...: def __init__(self, x):
...: self.x = x
...:
In [5]: class B:
...: def __init__(self, x):
...: self.x = x
...:
In [6]: @dispatch
...: def _convert(t: Val[A], x: B):
...: return A(x.x)
...:
In [7]: @dispatch
...: def _convert(t: Val[B], x: A):
...: return B(x.x)
...:
In [8]: convert(A, B(1))
Out[8]: <__main__.A at 0x7f5cc6728b50>
In [9]: %timeit convert(A, B(1))
1.88 µs ± 65.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [10]: %timeit A(B(1).x)
592 ns ± 26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
You might be able to squeeze out a little more performance, but this looks pretty good to me already.
With the caching you suggest, for the benchmark I was using, Plum is now only 7x slower, versus 50x slower: https://github.com/ggoretkin-bdai/investigate-plum-performance/commit/e07f6580d1f44499f95508f35fdb01fbdd4c9768
A couple comments
is_type
?plum.convert
functionality, which side-steps the issue of Val
creation entirely. This improves performance to 4x penalty! https://github.com/ggoretkin-bdai/investigate-plum-performance/commit/bf27b3ad6188cab5bc86ecf7efa15a1778fe8a52I am now trying to see how I can avoid annotating the from/to types twice:
@conversion_method(type_from=SpatialMath_SE3, type_to=BD_SE3Pose) # type: ignore[no-redef]
def convert_whatever(from_: SpatialMath_SE3) -> BD_SE3Pose: # noqa: F811
return BD_SE3Pose(from_.data5)
I need them annotated in the type hints, but would like to avoid annotating them in the decorator, as is achieved via: https://github.com/beartype/plum/blob/80b614f3c820f3107b0a67aed63d066e6d8f0b25/plum/function.py#L260
signature (:class:
.signature.Signature
, optional): Signature. If it is not given, it will be derived fromf
.
Here is an attempt that appears to work: https://github.com/ggoretkin-bdai/investigate-plum-performance/commit/3af609f78d3cc08b3a908e3b7be10039802fe8b6
by defining
def conversion_method_from_signature(f):
"""Decorator to add a conversion method to convert an object from one
type to another.
Like `plum.conversion_method` but the arguments:
type_from (type): Type to convert from.
type_to (type): Type to convert to.
are extracted from the type annotations
"""
signature = plum.extract_signature(f)
[type_from] = signature.types
type_to = signature.return_type
plum.promotion.add_conversion_method(type_from, type_to, f)
@ggoretkin-bdai, to answer your questions:
Yes, ideally more caching should happen to optimise performance. I think that is_type
can be safely cached, though I should think this over more carefully. If you'd want to submit a PR, then that would be more than welcome. :) Otherwise, I'll put it on the list of TODOs.
Great!! 4x is getting closer. plum.convert
definitely isn't yet optimised for performance, so perhaps something can still be squeezed there.
Your conversion_method_from_signature
looks great. :) That's exactly what I would've suggested.
In am interested in reducing the performance penalty of using
plum
in my application vs dispatching "manually" with anif/elif isinstance
chain. Profiling (flamegraph below) reveals a majority of time spent inis_type
: https://github.com/beartype/plum/blob/b38a1625c30a235d4e7a20d8a5828280fdb0fba7/plum/parametric.py#L282-L298 .For my specific microbenchmark,
is_type
is only ever called with ONE value, aplum.parametric.Val[example_py_multiple_dispatch.zoo.MyType]()
.Any ideas for improving the performance? Are there any correctness issues with memoizing the result of
is_type
? I see that there is memoization happening at lower levels via https://github.com/beartype/beartype/blob/49560b8d4d788d279a02283d8da995895440554c/beartype/_util/cache/utilcachecall.py .