Open jeremystretch opened 3 years ago
Hi
It's definitely possible to query these values with SQL directly. But, the CTE starts from the roots of the tree and recursively fetches the children so adding the cumulative behavior (which sums totals from the leaves to the root) isn't straightforward.
I briefly looked at the netbox repository. It seems your trees are too large to load the whole tree into RAM and sum up related entries in Python code?
I think I'd attempt solving this with a LATERAL JOIN (if on Postgres) and continue from there. On Postgres it should be relatively easy and safe since it supports arrays out of the box. However, if you have hundreds of thousands of nodes in your tree building it up for each query may be too expensive. There's always the possibility to use a materialized view, see https://schinckel.net/2016/01/19/postgres-tree-shootout-part-3%3A-adjacency-list-using-views/ With tens of thousands of nodes it may still work well according to the blogpost linked above.
Or, you could have a look at https://github.com/django-treebeard/django-treebeard/ – maybe you wouldn't even have to change much since treebeard supports nested sets, a different name for mptt.
I hope this helps :)
Oops, wrong button.
Awesome info @matthiask, thank you! We're still very much exploring options so I'm going to dig into each of your suggestions some more.
It seems your trees are too large to load the whole tree into RAM and sum up related entries in Python code?
I think that's accurate, yes. In instances where there exist many thousands of nested objects it introduces a huge performance penalty, so we need to keep everything in the database.
In case someone needs, here is a method to do add_related_count adapted from MPTT. It seems to work in testing but hasn't had real-world usage:
def add_related_count(
self,
queryset,
rel_model,
rel_field,
count_attr,
cumulative=False,
extra_filters={},
):
"""
Adds a related item count to a given ``QuerySet`` using its
``extra`` method, for a ``Model`` class which has a relation to
this ``Manager``'s ``Model`` class.
Arguments:
``rel_model``
A ``Model`` class which has a relation to this `Manager``'s
``Model`` class.
``rel_field``
The name of the field in ``rel_model`` which holds the
relation.
``count_attr``
The name of an attribute which should be added to each item in
this ``QuerySet``, containing a count of how many instances
of ``rel_model`` are related to it through ``rel_field``.
``cumulative``
If ``True``, the count will be for each item and all of its
descendants, otherwise it will be for each item itself.
``extra_filters``
Dict with aditional parameters filtering the related queryset.
"""
if cumulative:
SQL = """
SELECT count(*)
FROM (
SELECT (__tree.tree_path) AS "tree_path"
FROM "{table1}" T3 JOIN __tree ON T3."{rel_field}_id" = __tree.tree_pk
WHERE ("{table2}"."id" = ANY(tree_path))
) _count
"""
params = {
"table1": rel_model._meta.db_table,
"table2": rel_model._meta.get_field(rel_field).remote_field.model._meta.db_table,
"rel_field": rel_field,
}
return queryset.with_tree_fields().annotate(**{count_attr: RawSQL(SQL.format(**params), {})})
else:
current_rel_model = rel_model
for rel_field_part in rel_field.split("__"):
current_tree_field = current_rel_model._meta.get_field(rel_field_part)
current_rel_model = current_tree_field.related_model
tree_field = current_tree_field
if isinstance(tree_field, ManyToManyField):
field_name = "pk"
else:
field_name = tree_field.remote_field.field_name
subquery_filters = {
rel_field: OuterRef(field_name),
}
subquery = rel_model.objects.filter(**subquery_filters, **extra_filters).values("pk")
return queryset.with_tree_fields().annotate(**{count_attr: SQCount(subquery)})
Hi, we're looking at migrating from django-mptt and I was wondering if there's a recommended strategy for replacing its
add_related_count()
TreeManager method. We rely on this to provide a cumulative count of related objects assigned to a nested model.For example, suppose we have Region instances representing countries, states, and cities, and we have Site instances which can be assigned to any Region. We'd call something like this to annotate a cumulative count of sites for each region:
Replicating MPTT's approach directly doesn't seem to be feasible, since its querying against concrete database fields: Django's Subquery class doesn't understand
tree_path
generated by CTE. (It yields aFieldError: Cannot resolve keyword 'tree_path' into field.
) However I'll admit this is challenging my own skill with SQL.Has anyone come up with a similar solution using CTE?