ahyatt / ekg

The emacs knowledge graph, app for notes and structured data.
GNU General Public License v3.0
230 stars 19 forks source link

Improve performance on note id curating #129

Closed qingshuizheng closed 9 months ago

qingshuizheng commented 9 months ago

Background:

Func ekg-document-titles & ekg-active-note-ids go through loads of ids one-by-one to check if an id is active with ekg-active-id-p, which is really expensive and takes a long time to finish for a large database.

That makes it really struggling to use funcs that rely on the check, for example, ekg--transclude-titled-note-completion, which depends on ekg-document-title to refresh candidate list on every input.

--

Comparison of two functions filtering active note-ids:

Old func:

(defun old-collect-active-ids ()
  (seq-filter
   #'ekg-active-id-p
   (triples-subjects-of-type ekg-db 'titled)))

New func:

(defun ekg-inactive-note-ids ()
  "Get a list of ekg-note objects, representing all inactive notes.
  Inactive in this context means trashed or draft note."
  (delete-dups
   (flatten-list
    (mapcar
     (lambda (tag)
       (plist-get (triples-get-type ekg-db tag 'tag) :tagged))
     (seq-filter
      (lambda (tag)
        (string-match-p
         (rx (seq bol (or "trash/" (seq "draft" eol)))) tag))
      (triples-subjects-of-type ekg-db 'tag))))))

(defun new-collect-active-ids ()
  (seq-difference
   (triples-subjects-of-type ekg-db 'titled)
   (ekg-inactive-note-ids)))

-- Test: if elements filtered by the above two are the same:

(let ((old (old-collect-active-ids))
      (new (new-collect-active-ids)))
  (and (cl-subsetp old new)
       (cl-subsetp new old))) ; result => t => the elements are the same

-- Benchmark (rough) on my personal triples.db

record number: 32702 title number: 2401 tag number: 2566 note containing tags: 1728 note containing text: 2566

| fn                   | old    | new    |  ratio |
|----------------------+--------+--------+--------|
| *-collect-active-ids | 15.13s | 0.038s | 398.31 |
| ekg-document-titles  | 15.89s | 1.50s  |  10.60 |
| ekg-active-note-ids  | 15.49s | 0.039s | 397.18 |

As indicated above, the new method is way way way faster than the original. So this commit re-adapts the related code to the new method.

Note: The benchmark could be biased, since my db has no more than 10 draft/trashed notes.

Update: formating

qingshuizheng commented 9 months ago

An even faster solution for ekg--transclude-titled-note-completion is to cache ekg-document-titles. But that needs some sweats, and I don't use it for the moment, 1.50 seconds (in my old macbook pro) to finish an ekg-document-titles call is decent enough for me after this change.

qingshuizheng commented 9 months ago

Just realize we could do the same for ekg-show-notes-in-trash. Please wait a sec, I will push another commit later.

qingshuizheng commented 9 months ago

I force-pushed a commit including:

  1. delete-dups -> seq-uniq
  2. change how ekg-show-notes-in-trash curates note ids of trashed note.
ahyatt commented 9 months ago

Thank you for these fixes, they make a lot of sense. I think you must have the largest repository, so you seem to be running up against these perf problems first. Let me know what else you find!

qingshuizheng commented 9 months ago

Let me know what else you find!

Sure! Thanks for the review.