c0fec0de / anytree

Python tree data library
Apache License 2.0
947 stars 133 forks source link

Caching file tree (Feature suggestion) #265

Open Day0Dreamer opened 1 month ago

Day0Dreamer commented 1 month ago

So, there I was, trying to build a Google Drive tree file structure using "anytree" as the backbone.

You get a list of files from Google and get to guess what is what and belongs where. It is not ordered, so sometimes you get a file from a subfolder, before said subfolder.

Each file (folder is also files there) has a parent in a form of an ID.

Search function doesn't cache stuff, and on big trees, it takes hours to find the parent of yet another millionths file.

Cache functionality suggested in the documentation (and the suggestion seems to getting outdated too) implores a lru cache.

Problem being - it caches the None response for a parent when we got from Google the child first and have not yet created a parent. So when the parent arrives and we rerun the search on an orphan file, cached None gets returned out of the cache.

So I ended up with a following code chunk:

from anytree import Node, RenderTree, search
from functools import lru_cache, wraps

def cache_non_none(func):
    cached_func = lru_cache(maxsize=None)(func)

    @wraps(func)
    def wrapper(*args, **kwargs):
        result = cached_func(*args, **kwargs)
        if result is None:
            # If the result is None, clear this specific cache entry
            cached_func.cache_clear()  # Clears the entire cache
            return None
        return result

    return wrapper

@cache_non_none
def find_by_attribute(node, value, name="name", maxlevel=None):
    return search.find_by_attr(node, value, name=name, maxlevel=maxlevel)

that discards caching if None is found, and retains caching if something was indeed found.

Now anybody feeling like making a pull request, I've got nothing against it. Just wanted to share corner solution to a corner case of mine

Love <3