fabiocaccamo / django-treenode

:deciduous_tree: probably the best abstract model/admin for your tree based stuff.
MIT License
676 stars 33 forks source link

Implement closure-tree to improve performance. #34

Open TimurKady opened 3 years ago

TimurKady commented 3 years ago

Hello! I want to share the experience of using the module. Today I see many of its advantages and two main disadvantages. One of them I want to discuss. This is control over the order of elements.

Description of the problem. The module does not have a transparent and understandable mechanism for ordering elements in the tree. In practice, the most common are two modes: alphabetical sorting and strict order set by the user. The logic dictates that the tn_priority field provided by the default form should do this. If it is set to 0 for all elements, then they must be ordered alphabetically. If it is specified, then the field value determines the order.

But alas, this is not the case.

Another method of establishing a coercive order of elements, which is intuitively prompted by experience, is also not suitable. This is an attempt to import data with the tn_order field set.

It would be nice if you made it easier to manage the order of items in tree.

PS. The main trouble is that with all my attachment to this module, without a mechanism for intelligible order management, I have to refuse to use it. And it just tears my soul to shreds :-(

Funding

Fund with Polar

fabiocaccamo commented 3 years ago

@TimurKhan thank you for reporting your experience with this library.

The ordering problem was really hard to solve because the objective was to sort an N levels data structure which is stored in a flat data-structure (just 1 db table) and at the same time each object must respect the order of its ancestors.

I understand that the solution is not very transparent but it solves the problem well, if you have any idea on how to obtain the same result in a better way feel free to share it.

The tn_order field is used internally for the final ordering, its value is calculated based on the tn_priority value.

Please try to import your data using the tn_priority instead of tn_order and I think that it will work correctly (let me know).

TimurKady commented 3 years ago

I will think about how to solve the problem. What is obvious now (as a statement of the task) is that the tn_priority parameter must be abandoned. At least its purpose should be to set the position of the element in the list of children of the node. At one time (until I found your library) I had some developments. I will try to present something.

I've tried all the options. First, I assigned the tn_priority a order number in the list of node . Secondly, I tried to duplicate the id in the same field. Nothing helped. All the time, sorting occurs in the reverse order of id.

TimurKady commented 3 years ago

I have a quick idea. It doesn't require a lot of code change.

  1. If the user needs an alphabetical ordering, then we leave the decision to him. Let's enter a CharField field, will create an index (db_index=True) in it, will indicate this field as a sorting field in the Meta class.

  2. We must change the type of the tn_order field to BinaryField. We store the Materialized Path to the tree node in it. We allocate two bytes (word or 65536 child nodes) for the binary representation of tn_priority of each node along the path. We must create an index. We don't change anything in the Meta class.

In this case, sorting will occur correctly when the ordinal number of the node in the children of this node is specified in the tn_priority field.

This will slow down the already not very fast insertion of the node, but I think it will be also useful for issuing the Materialized Path without traversing the tree.

fabiocaccamo commented 3 years ago

This doesn't seem to be more a more transparent solution :)

Anyway, before the current implementation, I stored the ordering string in a db field, but it could become very long in case of deep structures, especially if the pk is a UUIDField, and there is not real need to store it, so I decided to compute the ordering string just in memory and use it to calculate the final order using integer field.

I agree with the fact that tn_order and tn_priority field could be merged into a single field, that would surely reduce confusion.

TimurKady commented 3 years ago

Yes: of course! The combined field must in any case unambiguously specify the order of the element in the list of the node's children.

fabiocaccamo commented 2 years ago

@TimurKhan have you any update on this?

TimurKady commented 2 years ago

Befor NY I will sent u vertion. Now in testing

TimurKady commented 2 years ago

Hi Fabio! After testing various ideas, we settled on this variation, with which we use your module:

    def __get_node_order_str(self):

        return '.'.join(
            '{0:04d}'.format(item)
            for item in self.get_breadcrumbs(attr='tn_priority')
        )

    def save(self, *args, **kwargs):
        def sort_key(obj):
            return obj.tn_priority

        super().save(*args, **kwargs)
        siblings = sorted(self.get_siblings(), key=sort_key)
        index = 1
        for leaf in siblings:
            if leaf.tn_priority >= self.tn_priority:
                leaf.tn_priority = self.tn_priority + index
                leaf.save()
                index += 1

It doesn't seem to work badly. The main thing is that it allows you to explicitly specify the order of the entries in the tree. Perhaps you will not like this solution, or seem too straightforward. But... :-)

fabiocaccamo commented 2 years ago

@TimurKhan thank you for the feedback, I will test it and let you know!

TimurKady commented 2 years ago

Hi @fabiocaccamo! Below is the more correct code. Also, it doesn't slow down the code working and use less no more 3 requests

    def __get_node_order_str(self):

        return '.'.join(
            '{0:04d}'.format(item)
            for item in self.get_breadcrumbs(attr='tn_priority')
        )

    def save(self, force_insert=False, force_update=False, *args, **kwargs):

        def sort_key(obj):
            return obj.tn_priority

        def order_list(obj_list):
            index = 0
            for obj in obj_list:
                obj.tn_priority = index
                index += 1
            self._meta.model.objects.bulk_update(obj_list, ['tn_priority'])

        try:
            parent = self.tn_parent
        except:
            parent = None

        if parent:
            new_siblings = sorted(self.tn_parent.get_children(), key=sort_key)
        else:
            roots = self._meta.model.get_roots()
            new_siblings = sorted(roots, key=sort_key)

        if self.pk and not force_insert:
            try:
                old = self._meta.model.objects.get(pk=self.pk)
                old_siblings = sorted(old.get_siblings(), key=sort_key)
            except:
                old = None

            if self in new_siblings:
                new_siblings.remove(self)
            new_siblings.insert(self.tn_priority, self)

            if old and (old.tn_parent != self.tn_parent):
                order_list(old_siblings)
                order_list(new_siblings)

            if old and (old.tn_priority != self.tn_priority):
                order_list(new_siblings)

        else:
            new_siblings.insert(self.tn_priority, self)
            super().save(*args, **kwargs)
            if len(new_siblings):
                order_list(new_siblings)

        self._meta.model.update_tree()
fabiocaccamo commented 2 years ago

Hello @TimurKady, thank you for the update, the code seems pretty clear, I will test it as soon as possible.

TimurKady commented 2 years ago

Tnx! If interested, I can also offer update for TreeNodeForm. It is a widget for the "tn_parent" field (signal2 tree based).

fabiocaccamo commented 2 years ago

@TimurKady there already an issue for that, check #37

TimurKady commented 2 years ago

I've adjusted the code a bit. Colleagues made a small bug. Sorry.

fabiocaccamo commented 2 years ago

@TimurKady I tried to test your code, but trying to run the test suite (python runtests.py) it gets stuck indefinitely (more than a minute, then I killed the process since the same it run in ~12 seconds on my machine).

TimurKady commented 2 years ago

To be honest, in any case, the main drawback of the library is the high cost of creating one record. We looked at the reason. You completely rebuild the tree when you insert each element with post_save. I already thought about making changes to the code. To do this, will have to change two methods (.save() anl __get_node_data()) and abandon the signals. But the costs are very high. Meybe leter...

fabiocaccamo commented 2 years ago

The main drawback of the library is the high cost of creating one record

This is what I needed to know, thanks!

TimurKady commented 2 years ago

But I really like your idea, to store the answers to the most likely requests in the database. Maybe later, when I have time, I will try to offer you an adjusted faster solution based on your idea. I made one mistake. I entrusted the revision of your library to a not very experienced young man. Unfortunately, I will have to go back to finalizing anyway, and apparently I will have to do this myself outside of working on our project.

fabiocaccamo commented 2 years ago

@TimurKady When I initially developed this library my need was to manage quickly a 4 levels menu (less than 200 entries) with a nice admin integration, so I come up to this solution and I used it in different projects always for solving the same problem.

Now looking at feedbacks, many devs are using it with very big trees with thousands of records, so performances are a top-priority issue to solve.

Updating the whole tree is not a scalable solution, now something more smart that continue to satisfy the tests is needed, not an easy task at all.

Just for curiosity, have you tried to do the same with django-treebeard?

TimurKady commented 2 years ago

I have an idea, or rather a suggestion. There is one neat way to drastically improve the performance of your library. The essence of the method is the simultaneous use of the main Adjacency List and the Closure Table. Then you will not have to incur large costs when creating a new element, and the bulk of the requests will still be performed in one database request.

If you don't mind, I'll have a free weekend so I can write most of the code. You will only need to add some methods that will ensure compatibility with earlier versions of the library and run tests. If you are in favor of this approach, then find a way to tell me your email address so that we can communicate not here, but directly.

TimurKady commented 2 years ago

Просто из любопытства, вы пытались сделать то же самое с django-treebeard?

Yes, of course! I have tried all tree packages. But yours appealed to me the most. All other libraries have less advantages than disadvantages.

fabiocaccamo commented 2 years ago

@TimurKady It's an ambitious change, I didn't know about closure tables, it seems a very good approach, thank you for suggesting it.

You can find my email on my GH profile.

TimurKady commented 2 years ago

@fabiocaccamo Unfortunately, there is nothing ambitious in this. The code will not be too big. The most ambitious is the support of the previous version.

I am well acquainted with graphs and algorithms for working with it. I'll close this part of the code. You know Django and cache working very well. Close this part!

I'll send you my code and we will discuss everything.

fabiocaccamo commented 2 years ago

@TimurKady I have just discovered these packages:

TimurKady commented 2 years ago

I'll look it later

TimurKady commented 2 years ago

Combination of Adjacency List and Closure Table (django-treenode based) Done!