beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.61k stars 1.8k forks source link

Performance: Common operations are slow. #2388

Open icefoxen opened 7 years ago

icefoxen commented 7 years ago

Just started using beets and I love it, but holy cats is it slow sometimes, even for very simple things like simple queries and listings. I only have ~10k songs and it's still rather a pain.

So, I'd like to make it faster, and welcome advice on how to do so. The first suspect is database calls. Compare how long it takes to list items for even a small library, vs. just reading the information from the database:

$ time beet ls | wc  
1.01user 0.12system 0:03.15elapsed 36%CPU (0avgtext+0avgdata 29912maxresident)k
0inputs+0outputs (0major+9671minor)pagefaults 0swaps
    385    2725   15198
$ time sqlite3 musiclibrary.blb 'select * from items;' | wc
0.11user 0.00system 0:00.11elapsed 97%CPU (0avgtext+0avgdata 5456maxresident)k
0inputs+0outputs (0major+697minor)pagefaults 0swaps
  11437  193901 3919008

(Also I'm not sure why my library of 385 songs has 11k items in it; that's something else to investigate.)

Some profiling suggests that this is a likely candidate:

$ python3 -m cProfile -s cumtime /usr/local/bin/beet ls | less
...
         703753 function calls (689872 primitive calls) in 3.696 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1159    2.377    0.002    2.377    0.002 {method 'execute' of 'sqlite3.Connection' objects}
      770    0.083    0.000    0.297    0.000 db.py:173(_awaken)
    34677    0.074    0.000    0.119    0.000 types.py:93(from_sql)

So yes, it does spend most of its time talking to the DB, and adding more items to the library seems to add 3 more sqlite3.Connection.execute() calls per library item. So, the first goal is to do more intelligent batching of database calls. If I can make ls faster, then at best it will make a lot of other things faster as well, and at worst I'll be able to start attacking other problems.

Other performance issues to look into: https://github.com/beetbox/beets/issues/2370 https://github.com/beetbox/beets/issues/2281 https://github.com/beetbox/beets/issues/1795

Setup

sampsyo commented 7 years ago

Hi! Thanks for looking into this—we could really use some help with digging into performance issues exactly like this.

I also agree that starting with beet ls, which really should be fast but isn't, is a good idea. Here, I think the issues is the way that we've designed the query and lookup process in beets.dbcore, which is simple and elegant but terrible for performance. Specifically, here's how it goes:

In each of these cases, we could make better choices about where to be lazy and where to be eager. In particular, one big win might be to fetch all the flexible attributes at once using a hefty JOIN clause—that would require more up-front work from the database but much fewer one-at-a-time, back-and-forth trips from Python to SQLite.

I hope that helps! I'm happy to explain other stuff in more detail too if it would be helpful. But even just doing this profiling work to find the bottlenecks in ls is really helpful, so please keep it coming!

As an aside, I recently discovered this tool while doing a little profiling work: https://github.com/jiffyclub/snakeviz

It renders a nice interactive, graphical representation of Python profiling data.

icefoxen commented 7 years ago

It helps a lot! Thank you. My initial attempt was going to be to just make everything eager and see what that does. This would mean storing a lot more stuff in memory, but minimizing the number of db calls that have to be made. The main problem with this is that it makes me start to worry about memory consumption for particularly large libraries, but I'm not sure how to avoid that without trying to restructure everything to be a streaming operation, and I certainly don't want to do something that drastic. The next step would be to try to make the db calls more clever so that it only eagerly pulls out what data is actually necessary for whatever the user is trying to do, a la those JOIN statements you mentioned. This might involve building up a query piecemeal and then turning it into SQL and executing it all as one big chunk at the end.

There might be a few pure-code things that can be done as well but database will definitely be the low-hanging fruit. I'll try out snakeviz, I've never seen it before! Thanks.

ghost commented 7 years ago

we're also missing indexes.

icefoxen commented 7 years ago

Initial impressions:

Some poking has made listing items 60% faster (on my machine), mostly by taking things away. See https://github.com/icefoxen/beets/commit/d12a1d2a82fe1cda4386fa3cd948ce538358d1ef . However it also breaks a lot of things, so no pull request yet.

sampsyo commented 7 years ago

Initial responses:

Anyway, as you say, it would be interesting to explore a version that does everything eagerly and start from there. That change could be easily confined to the way we construct Results so that client code would be totally unaware. You're right that there's a memory/performance trade-off, but I think it's worth building and measuring the actual memory overhead—and how it scales with library size—to see if it creates a meaningful problem.

icefoxen commented 7 years ago

Eagerly constructing Results is basically what the test change in my branch does. It does work! Haven't checked the memory performance though.

sampsyo commented 7 years ago

Got it, that makes sense. So you're saying, for example, that beet list would ideally bypass the ORMesque layer entirely—it wouldn't even bother constructing Item and Album objects at all. Instead, you would take a command like:

$ beet ls -f '$foo $bar'

and just issue the query SELECT foo, bar FROM items, and then fill the template with each returned row. No overhead required!

I totally agree that this would be the ideal speed. The game, of course, is finding a way to do that—or something close to it—while maintaining abstractions. I don't have any great ideas, but here are some bad ideas:

icefoxen commented 7 years ago

Not really. Abstraction is good. But each database call has quite a lot of overhead compared to the amount of actual work done, and we do multiple database calls to fetch all the information required for each item. The cost of actually constructing the Item and Album objects is... well, not quite trivial, but very small in comparison. We need an easier way to handle Items and Albums in batches, so that we do the same work with fewer calls into the database.

Right now if you want to construct 1000 Item objects based on a query, it does 1 database query to match the query, 1000 queries to get the attributes for each item, and 1000 queries to get the flexible attributes for each item. And then if you print them out, it does at least another 1000 queries in the Formatter objects for some reason (not sure what it's doing there yet, something about album info). Each of those database queries has a lot of overhead compared to the amount of work done; they're basically selecting a single row. It is much much faster to be able to say "oh we're going to be getting all of these objects at once anyway", do two queries (one to get attributes for all the items, one to get flexible attributes; you could even combine them together into a single call with a join) and construct all your Items from that info at once.

sampsyo commented 7 years ago

Yep, that sounds right. Maybe a useful exercise would be to hand-write the queries we wish we were executing, and then design a usable Python API around that?

RollingStar commented 7 years ago

I don't know if my issue belongs in a separate issue from this or not. It may fall under general performance just like this issue.

Problem

Same setup as #2446. Different issue: if a user wants to look up a giant boxset, it shouldn't take as long as it does. Is there any kind of optimization or parallelization that can be done? My CPU only has 20% use while this operation runs. What is beets actually doing when it's stuck looking up a big boxset? From my verbose log, it looks like it's calculating distance between the local files and candidate albums on the web.

ghost commented 6 years ago

I do not code and so I am not the best person to enter in this discussion, but based on my experience in compiling and testing stuff I guess the main reason of the beets slowness is the fact it is fully written in python.

If beets itself was written using some compiled language (C, C++, ADA etc) and then make use of the existent python libraries (py-musicbrainzngs, py-pyacoustid etc) it certainly would be rather faster.

While comparing apples and oranges, Picard is moderately fast but slow when performing actions what depend on (more or less) the same python libraries Beets does depend.

I am using beets for a while and I really enjoy it, but I was shocked when it took more the 36 hours to import a Maria Callas box with about 1400 tracks. :-O

It take about 2 seconds just to display "beet -h" in here. :-)

Just my 2c.

RollingStar commented 6 years ago

That's a good idea, but porting some or most of the code to a compiled language would be a huge undertaking. Similar gains may be possible just by optimizing the python code.

NoahTheDuke commented 5 years ago

The issue isn't with Python, the issue is with beets falling prey to the classic N+1 SELECT problem:

Right now if you want to construct 1000 Item objects based on a query, it does 1 database query to match the query, 1000 queries to get the attributes for each item, and 1000 queries to get the flexible attributes for each item.

The clever way to do such a series of queries would be to group all of the ids for each successive query and then join them together into a single WHERE statement and iterate over them in Python. This is what you meant by batching, right @icefoxen?

So if the old way was, selected_albums = "select * from db.albums where album_name = $foo" and then for song_id in selected_albums: song = "select * from db.songs where song_id = $song_id" and and so on, you'd call selected_albums = "select * from db.albums where album_name = $foo" and then song_ids = [song.id for song in selected_albums]; songs = "select * from db.songs where song_id in (" + ", ".join(song_ids) + ")" which would grab only those song_ids you actually one, but get them all in a single call.

I don't know anything about how beets works under the hood, so I don't know if that's easily done with the Object model y'all are working with, but it's a well-trod problem and can be done.

icefoxen commented 5 years ago

@NoahTheDuke Yeah that's more or less the problem, as I recall it. It was just too large a problem for me to be able to handle, since it seemed to me that it would take a substantial amount of redesigning to fix.

k6Q28k5w5eWuRuuzTg9j7GrKCbjt7 commented 5 years ago

Importing a parent folder with many albums vs "loop items in parentfolder and invoke beet import on every subfolder / album" improves the import time immensely. In fact, that's the only way I'm able to bulk import big sets (I just use a simple bash one liner). So I'm guessing the biggest gain will come from reflecting such invocation in the code.

SimonTeixidor commented 5 years ago

I decided to look into this a bit today. Beets is not painfully slow for me, but still quite a bit slower than it should be.

@NoahTheDuke: I disagree that N+1 SELECT is a problem here. While it would have been a problem in a traditional server/client DBMS, it doesn't really apply to sqlite. The latency is just much lower as sqlite does not do any network communication. In fact, the documentation even encourages the N+1 pattern [1].

I imagine that the indices in the DB where created after this issue was opened, because now the sqlite calls do not take a lot of time according to the profiler.

Running beet ls > /dev/null on my library of 14190 songs takes about 15 seconds. Profiling it shows that most of the effort is spent formatting and printing the output. As an experiment, I applied the following patch:

diff --git a/beets/ui/commands.py b/beets/ui/commands.py
index c89dbb6d..a9a6ec5c 100755
--- a/beets/ui/commands.py
+++ b/beets/ui/commands.py
@@ -1064,7 +1064,7 @@ def list_items(lib, query, album, fmt=u''):
             ui.print_(format(album, fmt))
     else:
         for item in lib.items(query):
-            ui.print_(format(item, fmt))
+            ui.print_(item.title)

 def list_func(lib, opts, args):

This brings the runtime down to 4 seconds. At this point, the output of cProfile looks like this:

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    192/1    0.013    0.000    4.908    4.908 {built-in method builtins.exec}
        1    0.000    0.000    4.908    4.908 beet:18(<module>)
        1    0.000    0.000    4.674    4.674 __init__.py:1261(main)
        1    0.000    0.000    4.674    4.674 __init__.py:1218(_raw_main)
        1    0.000    0.000    4.589    4.589 commands.py:1070(list_func)
        1    0.066    0.066    4.589    4.589 commands.py:1058(list_items)
    14192    0.101    0.000    2.165    0.000 db.py:660(_get_objects)
    14190    0.810    0.000    1.906    0.000 db.py:720(_make_model)
    14190    0.075    0.000    1.214    0.000 __init__.py:121(print_)
        6    0.000    0.000    0.845    0.141 db.py:820(query)
        2    0.000    0.000    0.844    0.422 library.py:1350(_fetch)
        2    0.000    0.000    0.844    0.422 db.py:1019(_fetch)
        1    0.000    0.000    0.844    0.844 library.py:1392(items)
    14190    0.012    0.000    0.819    0.000 __init__.py:84(_out_encoding)
    14190    0.037    0.000    0.807    0.000 __init__.py:90(_stream_encoding)

It seems like we spend 2 seconds translating db records to models (_make_model). That sounds like a lot, and might be possible to optimize as well.

I haven't yet looked into how much effort it would be to optimize these two functions (printing, and model creation), but it seems realistic to get beet ls down to under 5 seconds on a relatively large library without any major architectural changes.

[1] https://sqlite.org/np1queryprob.html

sampsyo commented 5 years ago

Awesome question! This is a super useful investigation.

I am particularly interested that you found such a high cost for formatting. Some time ago, I did a somewhat ridiculous thing to try to optimize template formatting—I wrote a compiler that translates template strings into proper Python functions for evaluation. After I did so, I vaguely remember concluding that it didn't have a large impact—and I thought it was because template formatting didn't matter all that much. And perhaps it doesn't—the problem might be formatting-adjacent—but it really looks like this piece of the puzzle is worth a closer look.

SimonTeixidor commented 5 years ago

Thanks for the hint! It looks like the template is actually recompiled over and over, which you actually discovered yourself in #2030 some time ago :)

As an experiment, I applied the following patch:

diff --git a/beets/dbcore/db.py b/beets/dbcore/db.py
index 97a4a7ce..d218141e 100755
--- a/beets/dbcore/db.py
+++ b/beets/dbcore/db.py
@@ -23,6 +23,7 @@ from collections import defaultdict
 import threading
 import sqlite3
 import contextlib
+import functools

 import beets
 from beets.util.functemplate import Template
@@ -36,6 +37,10 @@ else:
     from collections.abc import Mapping

+@functools.lru_cache(maxsize=None)
+def create_template(template_str):
+    return Template(template_str)
+
 class DBAccessError(Exception):
     """The SQLite database became inaccessible.

@@ -597,7 +602,7 @@ class Model(object):
         """
         # Perform substitution.
         if isinstance(template, six.string_types):
-            template = Template(template)
+            template = create_template(template)
         return template.substitute(self.formatted(for_path),
                                    self._template_funcs())

With this, beet ls runs in around 10 seconds, down from 15. There may be room for further improvement with just local optimizations in Template, as well. I'll investigate this further another day.

SimonTeixidor commented 5 years ago

I made investigated the templating logic further today, and created PR #3258 with my findings.

sampsyo commented 5 years ago

Thanks for the hint! It looks like the template is actually recompiled over and over, which you actually discovered yourself in #2030 some time ago :)

I am shocked but not surprised at my terrible memory. :smiley: Thank you for continuing this investigation!! :rocket: