Currently the `get_intersection_by_model` code does a `HAVING COUNT()`.
This regularly results in a very bad/slow query plan for large databases when
using PostgreSQL (not sure about other databases, but it's slow there).
Example query:
SELECT "entity_entity"."id"
FROM "entity_entity",
"tagging_taggeditem"
WHERE "tagging_taggeditem".content_type_id = 12
AND "tagging_taggeditem".tag_id IN ( 26, 13407, 992, 1848, 573 )
AND "entity_entity"."id" = "tagging_taggeditem".object_id
GROUP BY "entity_entity"."id"
HAVING COUNT("entity_entity"."id") = 5
Example query plan:
HashAggregate (cost=33573.98..33766.62 rows=12843 width=4) (actual time=971.732..972.503 rows=2 loops=1)
Filter: (count(entity_entity.id) = 5)
-> Hash Join (cost=30450.93..33509.76 rows=12843 width=4) (actual time=872.651..947.297 rows=13565 loops=1)
Hash Cond: (tagging_taggeditem.object_id = entity_entity.id)
-> Bitmap Heap Scan on tagging_taggeditem (cost=260.84..3014.65 rows=12843 width=4) (actual time=1.931..25.149 rows=13565 loops=1)
Recheck Cond: (tag_id = ANY ('{26,13407,992,1848,573}'::integer[]))
Filter: (content_type_id = 12)
-> Bitmap Index Scan on tagging_taggeditem_tag_id (cost=0.00..257.63 rows=12843 width=0) (actual time=1.584..1.584 rows=13565 loops=1)
Index Cond: (tag_id = ANY ('{26,13407,992,1848,573}'::integer[]))
-> Hash (cost=26392.82..26392.82 rows=303782 width=4) (actual time=870.620..870.620 rows=292905 loops=1)
-> Seq Scan on entity_entity (cost=0.00..26392.82 rows=303782 width=4) (actual time=0.007..428.755 rows=292905 loops=1)
Total runtime: 975.058 ms
An alternative to using having is simply doing an inner join between all the
tags:
SELECT entity_entity.id
FROM entity_entity
JOIN tagging_taggeditem t1
ON t1.object_id = entity_entity.id
AND t1.tag_id = 26
AND t1.content_type_id = 12
JOIN tagging_taggeditem t2
ON t2.object_id = entity_entity.id
AND t2.tag_id = 13407
AND t2.content_type_id = 12
JOIN tagging_taggeditem t3
ON t3.object_id = entity_entity.id
AND t3.tag_id = 992
AND t3.content_type_id = 12
JOIN tagging_taggeditem t4
ON t4.object_id = entity_entity.id
AND t4.tag_id = 1848
AND t4.content_type_id = 12
JOIN tagging_taggeditem t5
ON t5.object_id = entity_entity.id
AND t5.tag_id = 573
AND t5.content_type_id = 12
GROUP BY entity_entity.id
And the query plan:
HashAggregate (cost=229.27..229.28 rows=1 width=4) (actual time=2.788..2.790 rows=2 loops=1)
-> Nested Loop (cost=102.17..229.26 rows=1 width=4) (actual time=2.386..2.778 rows=2 loops=1)
-> Nested Loop (cost=102.17..220.92 rows=1 width=20) (actual time=2.374..2.747 rows=2 loops=1)
-> Nested Loop (cost=102.17..212.58 rows=1 width=16) (actual time=2.361..2.720 rows=2 loops=1)
-> Nested Loop (cost=102.17..204.24 rows=1 width=12) (actual time=2.345..2.688 rows=2 loops=1)
-> Hash Join (cost=102.17..195.92 rows=1 width=8) (actual time=2.326..2.649 rows=2 loops=1)
Hash Cond: (t3.object_id = t5.object_id)
-> Bitmap Heap Scan on tagging_taggeditem t3 (cost=4.48..97.38 rows=25 width=4) (actual time=0.085..0.466 rows=176 loops=1)
Recheck Cond: (tag_id = 992)
Filter: (content_type_id = 12)
-> Bitmap Index Scan on tagging_taggeditem_tag_id (cost=0.00..4.48 rows=25 width=0) (actual time=0.058..0.058 rows=176 loops=1)
Index Cond: (tag_id = 992)
-> Hash (cost=97.38..97.38 rows=25 width=4) (actual time=1.934..1.934 rows=450 loops=1)
-> Bitmap Heap Scan on tagging_taggeditem t5 (cost=4.48..97.38 rows=25 width=4) (actual time=0.164..1.261 rows=450 loops=1)
Recheck Cond: (tag_id = 573)
Filter: (content_type_id = 12)
-> Bitmap Index Scan on tagging_taggeditem_tag_id (cost=0.00..4.48 rows=25 width=0) (actual time=0.111..0.111 rows=450 loops=1)
Index Cond: (tag_id = 573)
-> Index Scan using entity_entity_pkey on entity_entity (cost=0.00..8.31 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=2)
Index Cond: (entity_entity.id = t3.object_id)
-> Index Scan using tagging_taggeditem_tag_id_key on tagging_taggeditem t4 (cost=0.00..8.33 rows=1 width=4) (actual time=0.008..0.009 rows=1 loops=2)
Index Cond: ((t4.tag_id = 1848) AND (t4.content_type_id = 12) AND (t4.object_id = entity_entity.id))
-> Index Scan using tagging_taggeditem_tag_id_key on tagging_taggeditem t2 (cost=0.00..8.33 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=2)
Index Cond: ((t2.tag_id = 13407) AND (t2.content_type_id = 12) AND (t2.object_id = entity_entity.id))
-> Index Scan using tagging_taggeditem_tag_id_key on tagging_taggeditem t1 (cost=0.00..8.33 rows=1 width=4) (actual time=0.008..0.009 rows=1 loops=2)
Index Cond: ((t1.tag_id = 26) AND (t1.content_type_id = 12) AND (t1.object_id = entity_entity.id))
The query is slightly more complex, but the runtime is 3ms instead of 980ms
(average, I've tested the query 10 times)
I will write the patch if you consider implementing this.
Original issue reported on code.google.com by Wolphie on 24 Dec 2010 at 2:42
Original issue reported on code.google.com by
Wolphie
on 24 Dec 2010 at 2:42