feincms / django-tree-queries

Adjacency-list trees for Django using recursive common table expressions. Supports PostgreSQL, sqlite, MySQL and MariaDB.
https://django-tree-queries.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
409 stars 26 forks source link

How to get ancestors per node for subquery? #54

Open jokiefer opened 1 year ago

jokiefer commented 1 year ago

In django-mptt i can subquery all ancestors per node by using the following code snippet:


 def get_ancestors_per_node(self, include_self: bool = False):
        return self.get_queryset().filter(
            tree_id=OuterRef("tree_id"),
            lft__lte=OuterRef("lft") if include_self else OuterRef(
                "lft") - 1,
            rght__gte=OuterRef(
               f"rght") if include_self else OuterRef("rght") + 1
        )
def something(self):
   return self.get_queryset().annotate(
       anchestors=ArraySubquery(self.get_ancestors_per_node())
   )

In django-tree-queries i thought i can do it like:

def ancestors_per_node(self, of, include_self=True):
        return self.get_queryset().ancestors(of=of, include_self=include_self)
def something(self):
  return self.get_queryset().annotate(
       anchestors=ArraySubquery(self.get_ancestors_per_node(of=OuterRef("pk")))
   )

But that results in an ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

Logically this can't work, cause the ancestors method of the TreeQuerySet does the lookup if the passed of has no tree_path attribute:

if not hasattr(of, "tree_path"):
      of = self.with_tree_fields().get(pk=pk(of))

Do you have any ideas, how i can construct an object from the OuterRef to pass it in the ancestors method; Or any other way to get the ancestors per node for subquerys?

matthiask commented 1 year ago

Hey

That's interesting. I didn't even know about ArraySubquery. The basic problem is that django-tree-queries is a bit of a hack (a well working hack, but still a hack) which doesn't really use much of Django's built-in querying support and instead works around it. An implementation which uses more of the ORM such as https://github.com/dimagi/django-cte would probably compose better.

Regarding your use case: I'm not sure what you're trying to do. Are you only trying to fetch the primary keys of the ancestors or are you also fetching the model instances of all ancestors? Or asked differently, does the ancestors attribute return model instances or just primary keys? If it's the latter: The tree_path attribute already contains all those. If it's the former: I'd probably fall back to fetching the nodes first, and fetching all ancestors at once after OR-ing together the set(node.tree_path) of all those nodes. That requires two database round trips instead of only one but that may be alright...?

Fetching the descendants of all nodes would be easier with ANY (https://github.com/matthiask/django-tree-queries/blob/f0f431358cf0a302142ff131dedd7482de6103e6/tree_queries/query.py#L108), I don't immediately see a way to solve this without subqueries for ancestors and django-tree-queries doesn't really support subqueries all that well... I'm sorry to say.

jokiefer commented 1 year ago

The 'ArraySubquery was just to explain, that it needs to run inside a subquery, cause of the 'OuterRef'.

The ancestors per node does return juat a common django 'QuerySet' whith that you can work in the common django way.

In my use case i got a tree of ogc wms layers. And there is some inherite magic, where for example the maximum scale of a layer is inherited by the parent, if it is null on the current node. So i need to lookup the scale value from the ancestors.

In my usecase i need a high performing query, which does no extra hits on the db.

Take a look to my manager

There is the queryset which returns the ancestors per node and many other querys which are based on that to return just the inherited values.

matthiask commented 1 year ago

So if you are inheriting values from ancestors it may make sense to write the SQL for the recursive CTE yourself. It's not that hard, especially if you do not have to support multiple databases and more so if you only have to support PostgreSQL. This is probably the "official" way to go since the Django ORM doesn't support CTEs at all yet. There's probably not a good alternative right now anyway if you are sure you need the performance and everything.