JuliaCollections / Memoize.jl

@memoize macro for Julia
Other
175 stars 22 forks source link

Add support for method-wise cache invalidation. #71

Closed willow-ahrens closed 3 years ago

willow-ahrens commented 3 years ago

As requested, part 1 of ? of my previous too-big PR. I believe this patch allows for precompilation-safe method-wise cache invalidation needed in #59. Cache lookup works as follows:

Use Julia's method lookup to find a method based on a target signature. Then look in the module the method was defined in for a global dictionary named __memories__. If the dictionary exists, then look for the method based on the structure of its signature, rather than the Method object itself (for some reason, it seems the the hash identity of the methods changes during precompilation).

We determine if we are overwriting a method by looking up the signature in an earlier world age because the function name we are looking for may not yet be defined. Note that finding a method with which is not enough to know we have overwritten the method: their signatures must be ==.

I have tested the effects of precompilation manually, I'm not sure if there's a way to test for this automatically.

willow-ahrens commented 3 years ago

@cstjean Alright, I've added a few tests for precompilation, so I think this is ready for review, unless you'd like me to rebase first. Method caches are stored in the module that defined the method. Methods will always have a cache in the defining module even after precompilation. If the method is called in a module during precompilation, the result will be preserved after precompilation. If, during precompilation, we call a memoized method defined in a different module, that result will not be preserved, since caches are stored in the defining module, not the calling module. I believe this is also a limitation of the current design on master.

willow-ahrens commented 3 years ago

This PR can be simplified by using the which(f, args) form included in Base. The reason I included the custom _which(signature) function was to support callable objects in the future. If we have a callable type Foo, I wanted to be able to determine if there was a matching method of a Foo without constructing foo. Note that which(Foo, args...) asks for methods of type Foo rather than object Foo.

I also used the custom version of _which so that I could give it a world age. This allows me to define an anonymous function to get its name, then check if there were any methods with the same name before I just defined the function.

However, since this PR doesn't apply to anonymous functions or callable types, I can simplify it if the reviewers request as such.

willow-ahrens commented 3 years ago

I find myself unsure of the desired semantics of cache invalidation and scoping for closures. While this PR makes a lot of sense for global functions, I think that closures might want to use local scope to store their caches. Consider the following example:

function currypow(n)
    function innerpow(x)
        return x^n
    end
end

In this example, each call to currypow will invalidate the cache of the last call to currypow, even though that closure might still be callable. Interestingly, I think it will still allocate dictionaries for each closure to be memoized separately. It's hard to imagine a case where one would want the results of an inner function to be memoized globally, but also want the function to remain an inner function.

We can access variables that were closed over via getproperty, so I think it might be possible to solve this by checking all the closure fieldnames that were gensymed by Memoize to see if one matches the appropriate type signature.

cstjean commented 3 years ago

This PR can be simplified by using the which(f, args) form included in Base. The reason I included the custom _which(signature) function was to support callable objects in the future. If we have a callable type Foo, I wanted to be able to determine if there was a matching method of a Foo without constructing foo. Note that which(Foo, args...) asks for methods of type Foo rather than object Foo.

Thank you for clearing that up, I wondered. Supporting callables is neat, but I'm not sure it warrants a custom which. Let's leave it in the PR for now...

In this example, each call to currypow will invalidate the cache of the last call to currypow, even though that closure might still be callable.

Yes, that's not ideal. A similar problem is that

function foo()
    @memoize bar(z) = det(z)
    return bar(randn(1000,1000))
end

This should not leak memory (i.e. it shouldn't hold on to the big matrix after foo() has returned)

With https://github.com/JuliaCollections/Memoize.jl/pull/70#issuecomment-754876641, it wouldn't be an issue, because the memo Dict for @memoize inner_pow would just be a local variable. As you say,

I think that closures might want to use local scope to store their caches

👍

willow-ahrens commented 3 years ago

Supporting local scope is trickier than I thought, but I think it is accomplished in https://github.com/peterahrens/Memoize.jl/pull/1. The trickiest part of method overwriting, etc. in local scope is that the method is sometimes callable before the definition is executed and the cache gets initialized. Consider

function foo()
    function bar()
        return 1
    end
    bar()
    qux = 2
    function bar()
        return qux
    end
end

which, when called, gives UndefVarError: qux not defined. I think that it would be fair to disallow calling memoized functions before their definitions are evaluated.

cstjean commented 3 years ago

Calling a local function before defining it is bad style as far as I'm concerned, so I wouldn't worry about it.