GothenburgBitFactory / tasklib

A Python library for interacting with taskwarrior databases.
http://tasklib.readthedocs.org/en/latest/
BSD 3-Clause "New" or "Revised" License
147 stars 28 forks source link

Performance issue with task["depends"] #70

Closed thomasrebele closed 4 years ago

thomasrebele commented 4 years ago

Looking up the dependencies of some tasks lakes much longer than loading the tasks. This causes some noticeable lag in the new version of vit, a taskmanager front end. See https://github.com/scottkosty/vit/issues/216 for the related issue.

I've wrote a script to illustrate the problem: tasklib.zip It creates one base task and 40 tasks that depend on it. In the first loop it just iterates over the tasks and converts them to their string representation. The second loop retrieves the id of the depends.

The result looks like this on my machine:

loop1 time: 0.015522241592407227  tasks: 41
loop2 time: 0.33356642723083496  tasks: 41

Retrieving the id of the depends takes about 20x as long as loading the tasks. I suppose this is because of LazyUUIDTaskSet which looks up the base task by its UUID 40 times. Is there an easy way to improve this?

robgolding commented 4 years ago

Hey @thomasrebele, thanks for the detailed issue report. Having read through the referenced ticket as well, the only optimization I could think of making here would be some kind of global cache for tasks on a per-UUID basis, so we can avoid recreating Task (or LazyTaskUUID) objects so frequently. The quickly hacked-together diff below does indeed speed up your benchmark script:

--- a/tasklib/lazy.py
+++ b/tasklib/lazy.py
@@ -2,6 +2,7 @@
 Provides lazy implementations for Task and TaskQuerySet.
 """

+_TASK_CACHE = {}

 class LazyUUIDTask(object):
     """
@@ -72,7 +73,10 @@ class LazyUUIDTask(object):
         stored UUID.
         """

-        replacement = self._tw.tasks.get(uuid=self._uuid)
+        replacement = _TASK_CACHE.get(self._uuid)
+        if replacement is None:
+            replacement = self._tw.tasks.get(uuid=self._uuid)
+            _TASK_CACHE[self._uuid] = replacement
         self.__class__ = replacement.__class__
         self.__dict__ = replacement.__dict__

(note: this would need a lot more work to make it into the codebase!)

However, I worry that this might be "overfitting" the exact case in your benchmark script, as it only helps if there are many tasks that all depend on the same task. If this is a situation that comes up often for you, then it may indeed be helpful.

thomasrebele commented 4 years ago

Hi @robgolding, (and sorry for the late reply) indeed, something like this would be nice. It only comes up with this use case: https://github.com/scottkosty/vit/issues/216

I don't know whether it is worth optimizing this, but it's quite annoying to have a noticeable lag in a local terminal application that handles 50 tasks.

@thehunmonkgroup, what's your opinion?

thehunmonkgroup commented 4 years ago

My preference would be to solve the problem at a lower level for sure.

I do have an idea of how to approach a fix in VIT if needed, so it's not a dealbreaker, but it seems reasonable for tasklib to account for this use case.

robgolding commented 4 years ago

@thomasrebele @thehunmonkgroup I've released version 2.1.1 which includes a performance enhancement that should fix this issue. Please give it a go!

thomasrebele commented 4 years ago

Yes, the LRU cache fixes the performance problem. Thank you for adding it!