Suor / funcy

A fancy and practical functional tools
BSD 3-Clause "New" or "Revised" License
3.37k stars 144 forks source link

Allow specifying an error handler for memoize #52

Closed richard-engineering closed 7 years ago

richard-engineering commented 7 years ago

We ran into an issue where we'd still like an exception to be memoized and return the exception when using the memoize decorator. A way to specify an error handler function would be appreciated. Or provide a list of exception types which should be cached.

richard-engineering commented 7 years ago

Here's the code we are using to allow caching and reraising certain exception types. I also added some other convenience options along the way. It's kind of hard to write unit tests for memoize on our end, I had to manually go through the debugger step by step. Anyway if anyone is interested in improving this code to be included in funcy feel free to do so. :)

class SkipMemoization(Exception):
    pass

def memoize(cache_and_reraise=None,
               existing_memory=None,
               key_transform_function=None):
    """
    IMPORTANT: You need to call the decorator when using via ()
    Behaves similar to funcy's memoize function with extra features
    :param cache_and_reraise: An iterable of Exception types to catch, cache, and reraise.
    :param existing_memory:  Existing memory to use to prefill the cache
    :param key_transform_function: Transformation functions on the keys if necessary.
    Use to deal with equality issues (eg convert to string)
    :return: value calculated by the function this decorator is on or cached.
    """
    memory = existing_memory or {}
    cache_and_reraise = tuple(cache_and_reraise) if cache_and_reraise else tuple()

    @decorator
    def wrapper(call):
        args = call._args
        kwargs = call._kwargs

        if kwargs:
            key = args + tuple(sorted(kwargs.items()))
        else:
            key = args
        if key_transform_function:
            key = key_transform_function(key)
        if key in memory:
            stored_value = memory[key]
            if isinstance(stored_value, cache_and_reraise):
                # TODO Probably can cache the execinfo as well
                raise stored_value
            return stored_value
        else:
            try:
                value = memory[key] = call()
                return value
            except SkipMemoization as e:
                return e.args[0] if e.args else None
            except cache_and_reraise as e:
                memory[key] = e
                raise
    return wrapper

memoize.skip = SkipMemoization
Suor commented 7 years ago

Looks like you are tangling up things here. This could be decomposed into several decorators:

# Transforming args looks like a better idea than transforming key
@decorator
def transform_args(call, transform):
    args, kwargs = transform(*call._args, **call._kwargs)
    return call._func(*args, **kwargs)

@decorator
def store_errors(call, *errors):
    try:
        return call()
    except errors as e:
        return e

@decorator
def reraise_errors(call, *errors)
    result = call()
    if isinstance(result, errors):
        raise result
    return result

To manipulate memory we do need assistance from @memoize, and the best way would probably be just exposing memory via func attribute:

@memoise
def your_func(...):
    # ...

your_func.memory.update({...})

Then you can implement your complex memoize function:

def complex_memoize(cache_and_reraise=None,
                    existing_memory=None,
                    args_transform=None):
    decos = keep([
        transform_args(args_transform) if args_transform else None,
        reraise_errors(cache_and_reraise) if cache_and_reraise else None,
        memoize,
        store_errors(cache_and_reraise) if cache_and_reraise else None,
    ])
    if not existing_memory:
        return compose(*decos)

    def decorator(func):
        decorated = compose(*decos)(func)
        if existing_memory:
            decorated.memory.update(existing_memory)
        return decorated
    return decorator

But you should probably stick with:

def memoise_with_errors(*errors):
    return compose(
        reraise_errors(errors)
        memoize,
        store_errors(errors)
    )

And use @transform_args() and .memory.update() separately.

Suor commented 7 years ago

I will think about implementing .memory, the rest stays out for now.

Discussion is welcome though.

Suor commented 7 years ago

Exposed memory in c44cd7bf9185c8b7929ac3d3ccd7141c87ab7d6a

richard-engineering commented 7 years ago

Hey Suor, thanks for the quick response.

I read your breakout of my code which is helpful to understand how to implement more complex decorators.

Regarding:

def memoise_with_errors(*errors):
    return compose(
        reraise_errors(errors)
        memoize,
        store_errors(errors)
    )

I'm failing to see how it will store to memory errors (probably my lack of understanding of the flow of how it passes things from one decorator to another). If you would maybe detail how memoize will store errors to the memory, or how store_errors would do it would be appreciated. store_errors seems to just return errors? Does compose work in reverse order?

I know that usually with memoize you wouldn't want to store errors to the cache. However for my particular use case I actually want to store certain errors to the cache so the call isn't re-evaluated (it's a very time consuming call and it doesn't hit external endpoints so there's no point to re-evaluate if it failed).

Second regarding your transform_args that wouldn't actually do what I want. During debugging I found that if certain objects were passed to the function call it wouldn't hit the memoize cache. These would be objects where __eq__ was not implemented even though the objects represented the same thing (but different instances). That was the primary idea for adding a transformation function used by memoize. I only want to transform the keys in the cache for args and kwargs such that the cache misses are avoided. In these cases I would still want to pass the original args and kwargs to the call() itself.


(As an aside would I use it via something like the following?

@transform_args(my_function)
@memoize
def function_to_memoize(*args, **kwargs):
    pass

Your trick with existing memory is neat and is better than what I had in mind :)

Suor commented 7 years ago

memoize_with_errors() is equivalent to:

@reraise_errors(Err1, Err2)
@memoize
@store_errors(Err1, Err2)
def func(...):
    # ...

Here results flow func -> store_errors -> memoize -> reraise_errors, any error from the list is captured by @store_errors() and returned as ordinary result, same as you do in your code, then @memoize caches is, since it's just another value now, then @reraise_errors() looks at results and if that is an exception from the list raises it. So it should work the same as in your code.

Suor commented 7 years ago

On args and keys, looks like you need a custom key func, this makes sense, it should be especially helpful when passing non-hashable args.

Suor commented 7 years ago

Also, can you please tell more about how you use key func? I am struggling with a good example for docs.

richard-engineering commented 7 years ago

So an example of needing to use the key transform function is: (note ma_memoize is the combined memoize function I wrote above)

@ma_memoize(cache_and_reraise=(RuntimeError,))
def dummy_ma_memoize_error(obj, **kwargs):
    ret_value = kwargs.get('ret_value', 0)
    if isinstance(ret_value, BaseException):
        raise ret_value
    return ret_value

def test_ma_memoize_error(self):
    with pytest.raises(RuntimeError) as error:
        dummy_ma_memoize_error('a', ret_value=RuntimeError('Hello World'))
    with pytest.raises(RuntimeError) as error:
        dummy_ma_memoize_error('a', ret_value=RuntimeError('Hello World'))

In the second iteration it won't actually detect that the same call is stored in memory due to different error class instances not being equal (might be due to stacktrace). One solution is to declare the error once and pass the same instance. The other is to use a transform function eg: @ma_memoize(key_transform_function=lambda k: unicode(k))

This is just a very simple transform function as an example.

Another potentially better example would be to have dictionaries as part of the arguments (which is not hashable):

    some_function('a', input_data={'1': 2})

In which case something like @memoize(key_transform_func=lambda k: json.dumps(k)) could work. We do tend to use dictionaries fairly often in our codebase to pass around intermediary data (mostly to avoid having a million kwargs)

richard-engineering commented 7 years ago

Thanks for explaining how memoize_with_errors works :)

Suor commented 7 years ago

The issue is in passing exception as argument, which you normally don't do. And, yes, that could be solved with key_func.