Open GoogleCodeExporter opened 8 years ago
I am a little concerned by this. The plan is to move these queries into the
query
sets using the new queryset extra() support in django 1.0. I fear that the
resulting
SQL will have the bad performance you describe.
Original comment by doug.nap...@gmail.com
on 16 Sep 2008 at 4:50
I solved this problem by extracting inner query (it's not depend on main
query). So
this function works in two queries but much faster.
I did it on my modified version of tagging-app so can't give patch, but if
somebody
very needs it - I can try to repeat this on trunk version.
Original comment by n.le...@gmail.com
on 12 Nov 2008 at 1:59
n.leush -- can you please post your fix/code?
related_for_model works fine on sqlite even with 50,000 tagged items but on my
production server, on mysql, it is unacceptably slow. I have already
implemented mstepniowski's suggestion, which definitely sped it up so that the
query doesn't ALWAYS timeout, but its still much too slow and on the bigger
tags times out!
Original comment by ilias.ka...@gmail.com
on 7 Aug 2010 at 9:48
def related_for_model2(self, tags, model, counts=False, min_count=None, filters=None,
sort_by_count=False, limit=None):
"""
Obtain a list of tags related to a given list of tags - that
is, other tags used by items which have all the given tags.
If ``counts`` is True, a ``count`` attribute will be added to
each tag, indicating the number of items which have it in
addition to the given list of tags.
If ``min_count`` is given, only tags which have a ``count``
greater than or equal to ``min_count`` will be returned.
Passing a value for ``min_count`` implies ``counts=True``.
"""
if filters is None: filters = {}
if min_count is not None: counts = True
tags = get_tag_list(tags)
tag_count = len(tags)
if tag_count == 0:
return []
queryset = model._default_manager.filter()
queryset.query.add_q(Q(**filters))
extra_joins = ' '.join(queryset.query.get_from_clause()[0][1:])
where, params = queryset.query.where.as_sql()
if where:
extra_criteria = 'AND %s' % where
else:
extra_criteria = ''
tagged_item_table = qn(TaggedItem._meta.db_table)
tag_name = qn(self.model._meta.db_table)
model_table = qn(model._meta.db_table)
model_pk = '%s.%s' % (model_table, qn(model._meta.pk.column))
query = """
SELECT %(tagged_item)s.object_id
FROM %(tagged_item)s
INNER JOIN %(tag)s
ON %(tagged_item)s.tag_id = %(tag)s.id
INNER JOIN %(model)s
ON %(tagged_item)s.object_id = %(model_pk)s
%%s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
%%s
AND %(tag)s.id IN (%(tag_id_placeholders)s)
GROUP BY %(tagged_item)s.object_id%%s
HAVING COUNT(%(tagged_item)s.object_id) = %(tag_count)s
""" % {
'tag': tag_name,
'model': model_table,
'model_pk': model_pk,
'tagged_item': tagged_item_table,
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%%s'] * tag_count),
'tag_count': tag_count,
}
params_tags = [tag.pk for tag in tags]
params.extend(params_tags)
# TODO this is hardcode bugfix for some complicated query
if extra_joins.find("INNER JOIN `comment_nodes`") != -1:
extra_group = ", comment_nodes.id"
else:
extra_group = ""
cursor = connection.cursor()
cursor.execute(query % (extra_joins, extra_criteria, extra_group), params)
object_ids = []
for row in cursor.fetchall():
object_ids.append(row[0])
if len(object_ids) == 0:
return []
query = """
SELECT %(tag)s.id, %(tag)s.name, %(tag)s.slug%(count_sql)s
FROM %(tagged_item)s INNER JOIN %(tag)s ON %(tagged_item)s.tag_id = %(tag)s.id
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.object_id IN
(%(object_id_placeholders)s)
AND %(tag)s.id NOT IN (%(tag_id_placeholders)s)
GROUP BY %(tag)s.id, %(tag)s.name
%(min_count_sql)s
ORDER BY %(order_sql)s
%(limit_sql)s""" % {
'tag': tag_name,
'model': model_table,
'count_sql': counts and ', COUNT(%s.object_id) as c' % tagged_item_table or '',
'tagged_item': tagged_item_table,
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
'object_id_placeholders': ','.join(['%s'] * len(object_ids)),
'min_count_sql': min_count is not None and ('HAVING COUNT(%s.object_id) >= %s' % tagged_item_table) or '',
'order_sql': sort_by_count and ('c DESC') or ('%s.name ASC' % tag_name),
'limit_sql': limit and ('LIMIT %s' % limit) or ''
}
params = []
params.extend(object_ids)
params.extend(params_tags)
if min_count is not None:
params.append(min_count)
cursor = connection.cursor()
cursor.execute(query, params)
related = []
for row in cursor.fetchall():
tag = self.model(*row[:3])
if counts is True:
tag.count = row[3]
related.append(tag)
if sort_by_count:
sort_tags_by_name(related)
return related
Original comment by n.le...@gmail.com
on 7 Aug 2010 at 11:16
Any news on this? We're plagued by this exact problem.
Some more info, tho. After talking to a friend, he came up with an altered
query that ran a lot faster (from 15-20 seconds to 0.2-0.4 seconds).. The
approach creates a temporary table for the subquery, and then deletes it after
the query.
RAW sql:
------------------------------------
CREATE TEMPORARY TABLE tmpsame (key (object_id)) SELECT object_id FROM
`tagging_taggeditem`, `tagging_tag` WHERE `tagging_taggeditem`.content_type_id
= 14 AND `tagging_tag`.id = `tagging_taggeditem`.tag_id AND `tagging_tag`.id IN
(19,12,108,131,189) GROUP BY `tagging_taggeditem`.object_id HAVING
COUNT(`tagging_taggeditem`.object_id) = 5;
SELECT `tagging_tag`.id, `tagging_tag`.name,
COUNT(`tagging_taggeditem`.object_id) FROM `tagging_taggeditem` INNER JOIN
`tagging_tag` ON `tagging_taggeditem`.tag_id = `tagging_tag`.id
INNER JOIN tmpsame ON `tagging_taggeditem`.object_id = tmpsame.object_id AND
`tagging_tag`.id NOT IN (19,12,108,131,189) GROUP BY `tagging_tag`.id,
`tagging_tag`.name ORDER BY `tagging_tag`.name ASC;
DROP TEMPORARY TABLE tmpsame;
------------------------------------
Of course, with this approach the temp table's name should be randomized. It
might also be a solution to get the results of the first queries in python, and
then pass it to the next query (and thus not needing a temp table).
Original comment by TheTerra...@gmail.com
on 6 Apr 2011 at 10:58
Original issue reported on code.google.com by
mstepnio...@gmail.com
on 9 Sep 2008 at 11:24