Open stuaxo opened 5 years ago
Afaict the code that fails in TB does because generating a node id and creating it are separate - so when there are two processes, they can both get the same id - then, when it comes to writing only one succeeds.
Hi @BertrandBordage , any word on this?
Django-tree should be as multiprocessing safe as its PostgreSQL triggers are. This means that a concurrent write may see a wait for the trigger to end, leading to triggers queueing up. This can make a big import much slower than it could. So it's multiprocessing safe, but slower than without the tree structure.
That being said, what my team & I usually do when importing large amounts of data (concurrently or not) is disable the trigger using the context manager, then force rebuild all paths. Here is a recent example, which made the script go from taking 2 hours to 15 minutes: https://github.com/dezede/dezede/blob/master/dezede/management/commands/import_melodies.py#L745-L763
I keep this issue open since it's true that I did not write unit tests for ensuring it works as intended. In any case, after years using django-tree in multiprocessing production environments, I did not any data issue yet and never had to rebuild the tree structures, apart from the use case listed above.
Using django-treebeard with celery has been a bit of nightmare, it's mulitprocessing unsafe, and this caused a bit of a headache.
I'm guessing django-tree is MP safe, but the docs don't mention this. If it is, then I would mention it as it's a massive selling point over treebeard.
Digression / rant:- my own project uses treebeard (via djangocms), being only able to update one thing at a time (that uses TB underneath) is a severe limitation and really hamstrings my update code - but I'm stuck with it. If this is MP safe you might even attract more devs if you mention it.