Perfomance: reuse compiled format templates

The question arose recently whether our funky AST compiler in the functemplate module is actually worth it. I wrote this compiler some time ago hoping for a dramatic speedup in beet list time, but I don't think I ever documented evidence either way about the speedup, and I had a vague inkling that it wasn't having the effect we wanted. Since it's giving @jrobeson trouble with the Python 3 port, I decided to investigate whether the code was worth keeping at all.

TL;DR: The compiler can yield a 1.6x speedup on my machine! But we're currently using it in a way that negates all the benefits.

I used time beet ls > /dev/null to measure the time it takes to list all the items in my 15265-track library on my really old Core 2 Duo server. The time was 41.5 ± 0.3 seconds (mean and standard error over 4 runs). Not great.

Then I tried disabling the compiler (the functemplate module also includes a straightforward interpreter). Shockingly, the time actually got slightly better: 37.8 ± 0.3 seconds.

I investigated and found that, as you might have guessed, the code is written to invoke the compiler every time---so instead of compiling the template once, beets was compiling it 15 thousand times. :cry:

I patched the list command as a stopgap to use the compiler correctly:

diff --git a/beets/ui/commands.py b/beets/ui/commands.py
index 867a473..7c5b454 100644
--- a/beets/ui/commands.py
+++ b/beets/ui/commands.py
@@ -1057,12 +1057,14 @@ def list_items(lib, query, album, fmt=''):
     """Print out items in lib matching query. If album, then search for
     albums instead of single items.
     """
+    from beets.util.functemplate import Template
+    tmpl = Template(fmt)
     if album:
         for album in lib.albums(query):
-            ui.print_(format(album, fmt))
+            ui.print_(album.evaluate_template(tmpl))
     else:
         for item in lib.items(query):
-            ui.print_(format(item, fmt))
+            ui.print_(item.evaluate_template(tmpl))

and got much better time: 25.5 ± 0.2 seconds. Hot dang!

So the conclusions are:

The compiler actually does address a meaningful bottleneck!
We're completely missing out on the potential benefits!
We need to reuse compiled templates, at least for the list command if not for all commands that use the standard formats from the config file.

Here are the raw timings:

# Status quo.
$ time beet ls > /dev/null
beet ls > /dev/null  42.01s user 1.02s system 99% cpu 43.131 total
$ time beet ls > /dev/null
beet ls > /dev/null  41.83s user 1.02s system 99% cpu 42.961 total
$ time beet ls > /dev/null
beet ls > /dev/null  41.78s user 1.18s system 99% cpu 43.055 total
$ time beet ls > /dev/null
beet ls > /dev/null  40.72s user 1.27s system 99% cpu 42.114 total

# Compiler disabled.
$ time beet ls > /dev/null
beet ls > /dev/null  37.18s user 0.89s system 99% cpu 38.143 total
$ time beet ls > /dev/null
beet ls > /dev/null  37.45s user 1.10s system 99% cpu 38.781 total
$ time beet ls > /dev/null
beet ls > /dev/null  38.08s user 1.00s system 99% cpu 39.153 total
$ time beet ls > /dev/null
beet ls > /dev/null  38.33s user 1.12s system 99% cpu 39.527 total

# Compiler enabled, `list` command modified to actually pre-compile.
$ time beet ls > /dev/null
beet ls > /dev/null  25.54s user 1.06s system 99% cpu 26.671 total
$ time beet ls > /dev/null
beet ls > /dev/null  25.68s user 1.00s system 99% cpu 26.732 total
$ time beet ls > /dev/null
beet ls > /dev/null  25.75s user 0.95s system 99% cpu 26.745 total
$ time beet ls > /dev/null
beet ls > /dev/null  25.02s user 0.96s system 98% cpu 26.336 total

Maybe LRU-caching beets.util.functemplate.Template could be a solution? I'm not sure the builtin @functools.lru_cache can cache class instances. Google reveals the memento package as an alternative.

Is this worth an additional dependency? Otherwise, applying @lru_cache to functemplate._parse and functemplate.Template.translate might look a little less clean, but have the same effect (I didn't test any of this, just an idea that came to mind). This would probably require Template.translate to become a classmethod.

Yep, caching is one option. I don't think caching Template objects should be a problem for @functools.lru_cache.

But it's worth also considering explicit reuse. That is, in commands like list where we know we'll reuse the same template many times, just keeping around the Template object and reusing it could work just as well. Some considerations:

Pro-explicit: Caching means hidden module state, which can lead to testing and debugging problems. It also comes with costs for maintaining the cache. And the performance benefit may only matter in a few cases, like the list command, so caching everywhere might be overkill.
Pro-cache: No code necessary in the client.

I did some further testing yesterday, times for a beet list on a ~8000 song library were

real 12.186 +- 0.452
user 10.613 +- 0.365
sys 0.710 +- 0.048

before modifications.

Explicit reuse

Basically does the same as @sampsyo's patch above (which I think did not have the intended effect, though, as it compiles the empty template u'' ~~which will in turn prevent LibModel.__format__ from applying the default template~~ and bypasses the default template handling in LibModel.__format__).

diff --git a/beets/library.py b/beets/library.py
index 3d18a1b..a531a71 100644
--- a/beets/library.py
+++ b/beets/library.py
@@ -323,6 +323,16 @@ class LibModel(dbcore.Model):
     _format_config_key = None
     """Config key that specifies how an instance should be formatted.
     """
+    _format_template = None
+    """Caches the default template specified by `_format_config_key`.
+    """
+
+    @classmethod
+    def _get_format_template(cls):
+        if not cls._format_template:
+            cls._format_template = Template(
+                beets.config[cls._format_config_key].as_str())
+        return cls._format_template

     def _template_funcs(self):
         funcs = DefaultTemplateFunctions(self, self._db).functions()
@@ -343,7 +353,7 @@ class LibModel(dbcore.Model):

     def __format__(self, spec):
         if not spec:
-            spec = beets.config[self._format_config_key].as_str()
+            spec = self._get_format_template()
         result = self.evaluate_template(spec)
         if isinstance(spec, bytes):
             # if spec is a byte string then we must return a one as well
diff --git a/beets/ui/commands.py b/beets/ui/commands.py
index 6ea9872..b78a121 100644
--- a/beets/ui/commands.py
+++ b/beets/ui/commands.py
@@ -35,6 +35,7 @@ from beets import plugins
 from beets import importer
 from beets import util
 from beets.util import syspath, normpath, ancestry, displayable_path
+from beets.util.functemplate import Template
 from beets import library
 from beets import config
 from beets import logging
@@ -1059,6 +1060,11 @@ def list_items(lib, query, album, fmt=u''):
     """Print out items in lib matching query. If album, then search for
     albums instead of single items.
     """
+    if fmt:
+        # cache fmt if it's non-empty (i.e. not the default but user-provided)
+        # doing this for fmt==u'' would overwrite the default format
+        fmt = Template(fmt)
+
     if album:
         for album in lib.albums(query):
             ui.print_(format(album, fmt))

real 10.098 +- 0.698
user 8.338 +- 0.504
sys 0.770 +- 0.091

Caching

diff --git a/beets/util/functemplate.py b/beets/util/functemplate.py
index cdccc49..ab9b98d 100644
--- a/beets/util/functemplate.py
+++ b/beets/util/functemplate.py
@@ -505,11 +505,26 @@ def _parse(template):
 class Template(object):
     """A string template, including text, Symbols, and Calls.
     """
+
+    CACHE = dict()
+
+    def __new__(cls, template):
+        if template in cls.CACHE:
+            return cls.CACHE[template]
+        else:
+            return super(Template, cls).__new__(cls, template)
+
     def __init__(self, template):
-        self.expr = _parse(template)
-        self.original = template
-        if six.PY2:  # FIXME Compiler is broken on Python 3.
-            self.compiled = self.translate()
+        # Will also be called when __new__ returns a cached instance
+        if template not in self.CACHE:
+            # TODO: Is it necessary to make this a lru cache or is the number
+            # of templates low enough for all of commands to remember them all?
+            self.CACHE[template] = self
+
+            self.expr = _parse(template)
+            self.original = template
+            if six.PY2:  # FIXME Compiler is broken on Python 3.
+                self.compiled = self.translate()

     def __eq__(self, other):
         return self.original == other.original

real 10.230 +- 0.611
user 8.555 +- 0.446
sys 0.758 +- 0.068

Result

So there's really only this one Template causing problems in the list command. However, I needed to introduce some hidden state in LibModel even in what I called explicit reuse in order to keep this class managing the default template. If this is the only command where caching matters, I'm not sure which approch to choose, but if it turns out there're more, I'm in favor of doing it directly in Template. My impression is that otherwise, we'd need code for reusing templates in many places which would be error-prone by itself. Of course some way to clear the cache in between tests would need to be devised.

Or we could remove the default template from LibModel, but that might mean moving it to a large number of places.

Great! Thanks for confirming the performance with both strategies is about the same.

We should definitely do one of these, but I don't have a strong leaning either way.

Centralizing the cache in the template module is fine and resembles the standard library re module. (We should indeed make the cache LRU; otherwise we're in for weird problems in the future if we accidentally use too many templates.)
If we do use a cache, perhaps we should introduce a functemplate.template() function or something, equivalent to re.compile, that does the caching, just to keep that separate from the class constructor.
We could consider putting the cache in dbcore.db.Model too (but that doesn't seem to change much).
The default-template-only approach is also quite clean but might get messy later, as you point out.

beetbox / beets