bookieio / Bookie

Python based delicious.com replacement
GNU Affero General Public License v3.0
633 stars 138 forks source link

bookie query kills mysql server #540

Open anarcat opened 9 years ago

anarcat commented 9 years ago

we sometimes see the mysql server freak out because of bookie mysql users:

+-------+--------+-----------+--------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| Id    | User   | Host      | db     | Command | Time | State          | Info                                                                                                 |
+-------+--------+-----------+--------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
| 43855 | bookie | localhost | bookie | Sleep   |    4 |                | NULL                                                                                                 |
| 86402 | bookie | localhost | bookie | Query   |  119 | Sorting result | SELECT DISTINCT tags.name AS name
FROM tags
WHERE (tags.name LIKE concat('r', '%')) AND (EXISTS (S |
| 86403 | bookie | localhost | bookie | Sleep   |   54 |                | NULL                                                                                                 |
| 87131 | bookie | localhost | bookie | Query   |  103 | Sorting result | SELECT DISTINCT tags.name AS name
FROM tags
WHERE (tags.name LIKE concat('f', '%')) AND (EXISTS (S |
| 87132 | bookie | localhost | bookie | Query   |  291 | Sorting result | SELECT DISTINCT tags.name AS name
FROM tags
WHERE (tags.name LIKE concat('f', '%')) AND (EXISTS (S |
| 87217 | bookie | localhost | bookie | Query   |    0 | NULL           | show processlist                                                                                     |
+-------+--------+-----------+--------+---------+------+----------------+------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

The MySQL server would completely hog a CPU per such query, during a few minutes.

The complete query is:

SELECT DISTINCT tags.name AS name
FROM tags
WHERE (tags.name LIKE concat('r', '%')) AND (EXISTS (SELECT 1
FROM bmark_tags, bmarks
WHERE tags.tid = bmark_tags.tag_id AND bmarks.bid = bmark_tags.bmark_id AND bmarks.bid IN (SELECT bmarks.bid AS bmarks_bid
FROM bmarks
WHERE bmarks.username = 'anarcat' AND (EXISTS (SELECT 1
FROM bmark_tags, tags
WHERE bmarks.bid = bmark_tags.bmark_id AND tags.tid = bmark_tags.tag_id AND tags.tid IN (SELECT tags.tid AS tags_tid
FROM tags
WHERE tags.name IN ('debian', 'o') GROUP BY tags.tid))) GROUP BY bmarks.bid)))

EXPLAIN doesn't outline anything very obvious, other than filesort issues, which maybe means we need to ramp up the sort_buffer variable...

mysql> explain SELECT DISTINCT tags.name AS name FROM tags WHERE (tags.name LIKE concat('r', '%')) AND (EXISTS (SELECT 1 FROM bmark_tags, bmarks WHERE tags.tid = bmark_tags.tag_id AND bmarks.bid = bmark_tags.bmark_id AND bmarks.bid IN (SELECT bmarks.bid AS bmarks_bid FROM bmarks WHERE bmarks.username = 'anarcat' AND (EXISTS (SELECT 1 FROM bmark_tags, tags WHERE bmarks.bid = bmark_tags.bmark_id AND tags.tid = bmark_tags.tag_id AND tags.tid IN (SELECT tags.tid AS tags_tid FROM tags WHERE tags.name IN ('debian', 'o') GROUP BY tags.tid))) GROUP BY bmarks.bid)));
+----+--------------------+------------+--------+----------------+----------+---------+----------------------------+------+------------------------------------------+
| id | select_type        | table      | type   | possible_keys  | key      | key_len | ref                        | rows | Extra                                    |
+----+--------------------+------------+--------+----------------+----------+---------+----------------------------+------+------------------------------------------+
|  1 | PRIMARY            | tags       | range  | name           | name     | 258     | NULL                       |   77 | Using where; Using index                 |
|  2 | DEPENDENT SUBQUERY | bmark_tags | ref    | PRIMARY,tag_id | tag_id   | 4       | bookie.tags.tid            |    9 | Using where; Using index                 |
|  2 | DEPENDENT SUBQUERY | bmarks     | eq_ref | PRIMARY        | PRIMARY  | 4       | bookie.bmark_tags.bmark_id |    1 | Using index                              |
|  3 | DEPENDENT SUBQUERY | bmarks     | ref    | username       | username | 257     | const                      | 2811 | Using where; Using index; Using filesort |
|  4 | DEPENDENT SUBQUERY | bmark_tags | ref    | PRIMARY,tag_id | PRIMARY  | 4       | bookie.bmarks.bid          |    1 | Using where; Using index                 |
|  4 | DEPENDENT SUBQUERY | tags       | eq_ref | PRIMARY        | PRIMARY  | 4       | bookie.bmark_tags.tag_id   |    1 | Using index                              |
|  5 | DEPENDENT SUBQUERY | tags       | range  | name           | name     | 258     | NULL                       |    2 | Using where; Using index; Using filesort |
+----+--------------------+------------+--------+----------------+----------+---------+----------------------------+------+------------------------------------------+
7 rows in set (0.00 sec)

I was wondering if this query shouldn't be optimised in some way... That sure looks like a lot of subqueries...

anarcat commented 9 years ago

i am not sure, but this certainly looks like a degenerate case of the completion code in bookie.models.TagMgr.complete(). i wonder if we shouldn't put a minimum on the size of completion requests. for example, it looks like in the above query, it looks at all bookmarks that match o or debian. I believe this happened because i typed o rganisation and bookie parsed that as two tags, then submitted that to the completion engine trying to complete rganisation.

there are multiple ways of doing this. one is in the UI:

--- a/bookie/static/js/bookie/tagcontrol.js
+++ b/bookie/static/js/bookie/tagcontrol.js
@@ -417,6 +417,10 @@ YUI.add('bookie-tagcontrol', function (Y) {
          */
         _fetch_suggestions: function (qry, callback) {
             var tags;
+            // don't autcomplete short terms which may overload the database
+            if (qry.length <= 3) {
+                return '';
+            }
             this.ac.api = new Y.bookie.Api.route.TagComplete(
                 this.get('api_cfg')
             );

... but i don't think that's the right approach, personnally, because a malicious API user could still blow up the database. besides, the above patch only limits the size of rganisation and not o in the above example.

an alternative would be in the API itself:

diff --git a/bookie/views/api.py b/bookie/views/api.py
index 17c1172..54aebf7 100644
--- a/bookie/views/api.py
+++ b/bookie/views/api.py
@@ -624,11 +624,11 @@ def tag_complete(request):
         return _api_response(request, {})

     if 'current' in params and params['current'] != "":
-        current_tags = params['current'].split()
+        current_tags = [ x for x in params['current'].split() if len(x) > 3 ]
     else:
         current_tags = None

-    if 'tag' in params and params['tag']:
+    if 'tag' in params and params['tag'] and len(params['tag']) > 3:
         tag = params['tag']

         tags = TagMgr.complete(tag,

I would suggest working on the latter, the number 3 being arbitrary. It should obviously be a config variable of some sort, but before digging any deeper, i'd like to hear some feedback.

We'll try using the latter in production now.