default query for recent is really slow in pgsql

mitechie commented 10 years ago

The query we need to analyze and optimize is:

In the code it's this function that builds based on default values:

https://github.com/bookieio/Bookie/blob/develop/bookie/models/__init__.py#L413

SELECT tags.tid AS tags_tid, tags.name AS tags_name, anon_1.bmarks_bid AS anon_1_bmarks_bid, anon_1.bmarks_hash_id AS anon_1_bmarks_hash_id, anon_1.bmarks_description AS anon_1_bmarks_description, anon_1.bmarks_extended AS anon_1_bmarks_extended, anon_1.bmarks_stored AS anon_1_bmarks_stored, anon_1.bmarks_updated AS anon_1_bmarks_updated, anon_1.bmarks_clicks AS anon_1_bmarks_clicks, anon_1.bmarks_is_private AS anon_1_bmarks_is_private, anon_1.bmarks_inserted_by AS anon_1_bmarks_inserted_by, anon_1.bmarks_username AS anon_1_bmarks_username, anon_1.bmarks_tag_str AS anon_1_bmarks_tag_str, url_hash_1.hash_id AS url_hash_1_hash_id, url_hash_1.url AS url_hash_1_url, url_hash_1.clicks AS url_hash_1_clicks 
FROM (SELECT bmarks.bid AS bmarks_bid, bmarks.hash_id AS bmarks_hash_id, bmarks.description AS bmarks_description, bmarks.extended AS bmarks_extended, bmarks.stored AS bmarks_stored, bmarks.updated AS bmarks_updated, bmarks.clicks AS bmarks_clicks, bmarks.is_private AS bmarks_is_private, bmarks.inserted_by AS bmarks_inserted_by, bmarks.username AS bmarks_username, bmarks.tag_str AS bmarks_tag_str 
FROM bmarks 
WHERE bmarks.is_private = false ORDER BY bmarks.stored DESC 
 LIMIT 50 OFFSET 0) AS anon_1 LEFT OUTER JOIN (bmark_tags AS bmark_tags_1 JOIN tags ON  tags.tid = bmark_tags_1.tag_id) ON anon_1.bmarks_bid = bmark_tags_1.bmark_id LEFT OUTER      JOIN url_hash AS url_hash_1 ON url_hash_1.hash_id = anon_1.bmarks_hash_id;

Need to check if we're missing indexes we should have to help with this.

mitechie commented 10 years ago

Query plan

EXPLAIN ANALYZE SELECT tags.tid AS tags_tid, tags.name AS tags_name, anon_1.bmarks_bid AS anon_1_bmarks_bid, anon_1.bmarks_hash_id AS anon_1_bmarks_hash_id, anon_1.bmarks_description AS anon_1_bmarks_description, anon_1.bmarks_extended AS anon_1_bmarks_extended, anon_1.bmarks_stored AS anon_1_bmarks_stored, anon_1.bmarks_updated AS anon_1_bmarks_updated, anon_1.bmarks_clicks AS anon_1_bmarks_clicks, anon_1.bmarks_is_private AS anon_1_bmarks_is_private, anon_1.bmarks_inserted_by AS anon_1_bmarks_inserted_by, anon_1.bmarks_username AS anon_1_bmarks_username, anon_1.bmarks_tag_str AS anon_1_bmarks_tag_str, url_hash_1.hash_id AS url_hash_1_hash_id, url_hash_1.url AS url_hash_1_url, url_hash_1.clicks AS url_hash_1_clicks 
FROM (SELECT bmarks.bid AS bmarks_bid, bmarks.hash_id AS bmarks_hash_id, bmarks.description AS bmarks_description, bmarks.extended AS bmarks_extended, bmarks.stored AS bmarks_stored, bmarks.updated AS bmarks_updated, bmarks.clicks AS bmarks_clicks, bmarks.is_private AS bmarks_is_private, bmarks.inserted_by AS bmarks_inserted_by, bmarks.username AS bmarks_username, bmarks.tag_str AS bmarks_tag_str 
FROM bmarks 
WHERE bmarks.is_private = false ORDER BY bmarks.stored DESC 
 LIMIT 50 OFFSET 0) AS anon_1 LEFT OUTER JOIN (bmark_tags AS bmark_tags_1 JOIN tags ON tags.tid = bmark_tags_1.tag_id) ON anon_1.bmarks_bid = bmark_tags_1.bmark_id LEFT OUTER     JOIN url_hash AS url_hash_1 ON url_hash_1.hash_id = anon_1.bmarks_hash_id ORDER BY anon_1.bmarks_stored DESC;

 Hash Left Join  (cost=30824.30..33021.81 rows=191 width=1303) (actual time=3110.720..3319.032 rows=192 loops=1)
   Hash Cond: (bmarks.bid = bmark_tags_1.bmark_id)
   ->  Nested Loop Left Join  (cost=12231.42..12643.51 rows=50 width=1290) (actual time=735.022..736.260 rows=50 loops=1)
         ->  Limit  (cost=12231.42..12231.55 rows=50 width=150) (actual time=734.949..735.023 rows=50 loops=1)
               ->  Sort  (cost=12231.42..12578.63 rows=138883 width=150) (actual time=734.945..734.974 rows=50 loops=1)
                     Sort Key: bmarks.stored
                     Sort Method:  top-N heapsort  Memory: 28kB
                     ->  Seq Scan on bmarks  (cost=0.00..7617.83 rows=138883 width=150) (actual time=0.185..316.871 rows=138891 loops=1)
                           Filter: (NOT is_private)
         ->  Index Scan using url_hash_pkey on url_hash url_hash_1  (cost=0.00..8.22 rows=1 width=75) (actual time=0.021..0.022 rows=1 loops=50)
               Index Cond: ((url_hash_1.hash_id)::text = (bmarks.hash_id)::text)
   ->  Hash  (cost=12738.02..12738.02 rows=327589 width=17) (actual time=2374.826..2374.826 rows=341138 loops=1)
         Buckets: 4096  Batches: 16  Memory Usage: 794kB
         ->  Hash Join  (cost=823.86..12738.02 rows=327589 width=17) (actual time=82.827..1629.571 rows=341138 loops=1)
               Hash Cond: (bmark_tags_1.tag_id = tags.tid)
               ->  Seq Scan on bmark_tags bmark_tags_1  (cost=0.00..4952.89 rows=327589 width=8) (actual time=0.005..406.872 rows=341138 loops=1)
               ->  Hash  (cost=470.05..470.05 rows=28305 width=13) (actual time=82.791..82.791 rows=28447 loops=1)
                     Buckets: 4096  Batches: 1  Memory Usage: 934kB
                     ->  Seq Scan on tags  (cost=0.00..470.05 rows=28305 width=13) (actual time=0.003..36.796 rows=28447 loops=1)
 Total runtime: 3320.102 ms
(20 rows)

mitechie commented 10 years ago

I ran the current indexes on the tabls for bmarks, tags, url_hash

http://paste.ubuntu.com/8291600/

james-w commented 10 years ago

Hi,

Taking a stab at this, though I'm far from an expert.

It would be good to get a listing of the indexes that these tables have from the db.

Picking apart the query plan from the inside out:

Seq Scan on bmarks (cost=0.00..7617.83 rows=138883 width=150) (actual time=0.185..316.871 rows=138891 loops=1) Filter: (NOT is_private)

So it's doing a seq_scan on bmarks, hitting every row. That can probably be improved, but isn't the large component of the query.

Sort (cost=12231.42..12578.63 rows=138883 width=150) (actual time=734.945..734.974 rows=50 loops=1) Sort Key: bmarks.stored

It's then sorting the result of the above. with a limit.

This section can probably be improved with an index on (stored, is_private), as it would perhaps make this an index scan of 50ish rows.

All the mixing of the results from the other subquery is fairly fast.

The slow part is the two joins through to the tags table.

In particular there are two seq_scans for this, and then a very expensive hash join that hits 341138 rows.

Presumably not all of those rows are needed, so there needs to be better selection there, but I'm not immediately sure how to do that.

I think the first thing to try is some indexes to see what happens then.

You can try this on any representative db that you have, and you can pretty safely add indexes to production and delete them if you don't want to keep them.

The indexes I think would maybe help:

bmarks (stored, is_private) bmark_tags (bmark_id, tag_id)

You can see how that changes the query at that point. This may be enough to avoid the seq_scans and huge hash join, but we may have to restructure the query to do that.

mitechie commented 10 years ago

Those two indexes have helped a small amount. I've got the updated index list and query plan here:

http://paste.ubuntu.com/8292632/

Looks like it's cut about a half second off of it.

I tried to add an index on hash_id which actually hurt things.

The big things are definitely the tags integration hitting over 300k rows doing a seq scan. I can't seen to figure out what line this is referring to:

Hash (cost=13162.17..13162.17 rows=341162 width=17) (actual time=2545.008..2545.008 rows=341162 loops=1)

right above the Hash Join. It seems if we could limit that down from 300k down to a reasonable set we'd be in good shape and greatly simplify this. Any idea what 300k rows it's hitting?

james-w commented 10 years ago

Yeah, the index on (stored, is_public) moved it from a seq_scan to an index scan. It's a backwards scan, and the planner expects it to be a little more expensive than it is:

Index Scan Backward using bmarks_stored_is_private_idx on bmarks (cost=0.00..28690.92 rows=138925 width=150) (actual time=0.020..0.124

But I don't think that's a problem. You could make the index DESC on stored, but I don't think it would change this query at all.

The index on bmark_tags doesn't help at all, as it just duplicates the existing index, as the (bmark, tag) pair is the PK for that table.

You are right that the real cost is the scanning of the tags. I think the 300k rows is something like every tag on a bookmark?

It needs to be able to limit that scan based to only extract the tags for the bookmarks you are interested in.

One slightly odd thing about the query is that it is using a subquery. You can experiment running similar queries to see if something changes the shape of the plan significantly. Something like:

EXPLAIN ANALYZE SELECT * FROM bmarks LEFT OUTER JOIN (bmark_tags AS bmark_tags_1 JOIN tags ON tags.tid = bmark_tags_1.tag_id) ON anon_1.bmarks_bid = bmark_tags_1.bmark_id LEFT OUTER JOIN url_hash AS url_hash_1 ON url_hash_1.hash_id = bmarks.hash_id WHERE bmarks.is_private = false ORDER BY bmarks.stored DESC LIMIT 50 OFFSET 0 ORDER BY bmarks.stored DESC;

Dropping the hash:

EXPLAIN ANALYZE SELECT * FROM bmarks LEFT OUTER JOIN (bmark_tags AS bmark_tags_1 JOIN tags ON tags.tid = bmark_tag_1.tag_id) ON anon_1.bmarks_bid = bmark_tags_1.bmark_id WHERE bmarks.is_private = false ORDER BY bmarks.stored DESC LIMIT 50 OFFSET 0 ORDER BY bmarks.stored DESC;

You can also try just selecting the relevant tags with something like:

EXPLAIN ANALYZE SELECT * FROM bmark_tags JOIN tags on tags.tid = bmark_tag,tag_id WHERE bmark_tags.bmark_id = ANY (VALUES ($ID), ($ID));

inserting 50 values for $ID extracted by just running

SELECT bid from bmarks WHERE bmarks.is_private = false ORDER BY bmarks.stored DESC LIMIT 50 OFFSET 0 ORDER BY bmarks.stored DESC;

This last one will tell you a baseline for extracting the tags for 50 bookmarks. If that is slow then the other query can never be fast. If it is fast then we need to figure out how to shape the query to use the same plan.

I may have some of the syntax wrong, and you may want to check that the results look sensible without EXPLAIN.

james-w commented 10 years ago

Ah, I've been playing around with a local instance, and found the reason for the subquery, because the LIMIT needs to apply just to the bookmarks, rather than limiting the number of rows of the joined query, otherwise you will get incomplete data.

Toying with the broken queries I suggested shows that it will use the indexes. However, I expect that when using the subquery it can't inspect the data as well, and so does table scans expecting to have to process more data or something.

I'm going to sleep on it, and try some more queries tomorrow.

mitechie commented 10 years ago

Thanks james. I really appreciate the second set of eyes.

mitechie commented 10 years ago

Wow, this just might be a hardware issue. I pulled down the db locally and on my laptop the query is around 100ms. Now my laptop is on an ssd running pgsql 9.3. It might be time to upgrade the db server and try out an SSD cloud instance. Wow.

james-w commented 10 years ago

What's the query plan on your laptop? The SSD probably helps, but also the query plan may be better due to 9.3.

mitechie commented 10 years ago

I've not going through it with a fine tooth comb but at a glance it's definitely dealing with a smaller set of rows.

                 QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=12362.06..13587.48 rows=192 width=238) (actual time=179.214..197.042 rows=125 loops=1)
   ->  Nested Loop Left Join  (cost=12361.35..12775.93 rows=50 width=225) (actual time=179.143..189.336 rows=50 loops=1)
         ->  Limit  (cost=12360.93..12361.06 rows=50 width=150) (actual time=178.446..178.476 rows=50 loops=1)
               ->  Sort  (cost=12360.93..12707.01 rows=138432 width=150) (actual time=178.444..178.457 rows=50 loops=1)
                     Sort Key: bmarks.stored
                     Sort Method: top-N heapsort  Memory: 37kB
                     ->  Seq Scan on bmarks  (cost=0.00..7762.32 rows=138432 width=150) (actual time=0.541..118.752 rows=138431 loops=1)
                           Filter: (NOT is_private)
         ->  Index Scan using url_hash_pkey on url_hash url_hash_1  (cost=0.42..8.28 rows=1 width=75) (actual time=0.215..0.215 rows=1 loops=50)
               Index Cond: ((hash_id)::text = (bmarks.hash_id)::text)
   ->  Nested Loop  (cost=0.71..16.19 rows=4 width=17) (actual time=0.039..0.152 rows=2 loops=50)
         ->  Index Only Scan using bmark_tags_pkey on bmark_tags bmark_tags_1  (cost=0.42..14.91 rows=4 width=8) (actual time=0.007..0.007 rows=2 loops=50)
               Index Cond: (bmark_id = bmarks.bid)
               Heap Fetches: 122
         ->  Index Scan using tags_pkey on tags  (cost=0.29..0.31 rows=1 width=13) (actual time=0.058..0.058 rows=1 loops=122)
               Index Cond: (tid = bmark_tags_1.tag_id)
 Total runtime: 197.176 ms
(17 rows)

mitechie commented 10 years ago

Note, that was a local copy with the store/is_private on it. So It's got a seq scan of bmarks in there I think that would go away.

james-w commented 10 years ago

Yeah, the bmarks side is different because of the lack of that index. The interesting part is that the plan for the tags part is very different:

-> Nested Loop (cost=0.71..16.19 rows=4 width=17) (actual time=0.039..0.152 rows=2 loops=50) -> Index Only Scan using bmark_tags_pkey on bmark_tags bmark_tags_1 (cost=0.42..14.91 rows=4 width=8) (actual time=0.007..0.007 rows=2 loops=50) Index Cond: (bmark_id = bmarks.bid) Heap Fetches: 122 -> Index Scan using tags_pkey on tags (cost=0.29..0.31 rows=1 width=13) (actual time=0.058..0.058 rows=1 loops=122) Index Cond: (tid = bmark_tags_1.tag_id)

So it only expects 4 rows, and uses 2 index scans (1 index only).

So yes, the SSD will probably help, but I suspect this different plan is also a big difference. My local tests were also postgres 9.1, so I suspect this may well be a 9.3 improvement.

james-w commented 10 years ago

Indeed, moving my test db to 9.3 leads the same query plan as you, with the index scans for the tags. I suspect that trying an upgrade to 9.3 will improve things significantly on production.

mitechie commented 10 years ago

Yep, moved to a new ssd based ec2 server and 9.3 and the default data now comes in a 200ms or less vs the 1.9s it was previously. Now that's a nice boost!

mitechie commented 10 years ago

Closing this out, we've got bigger performance fish to fry than this now.

mitechie commented 10 years ago

Thanks so much for the help james, really nice advance and couldn't have done it without your help.

james-w commented 10 years ago

You are welcome, thanks for helping me sort out combo-loading!

bookieio / Bookie

default query for recent is really slow in pgsql #506