Use filtered-candidate-transformer in helm-org-ql

yantar92 commented 3 years ago

This issue is a followup from earlier reddit discussion: https://old.reddit.com/r/orgmode/comments/jysnrf/why_does_the_recent_zettelkasten_craze_use_one/gdal58p/

Background

I have huge org files containing thousands of todo items. For example, below is rough search for books/articles I plan to read some day:

#+begin_src emacs-lisp
(length (org-ql-select #'org-agenda-files '(outline-path "read")))
#+end_src

+RESULTS[325b6b2c617360ce2b5ec38daf5c14cd015d00d3]:

: 2089

The total number of items in someday lists is even higher:

#+begin_src emacs-lisp
(length (org-ql-select #'org-agenda-files '(outline-path "no deadline")))
#+end_src

+RESULTS[2a6e2a5100b074338f9def2ec844b1739735d61b]:

: 4273

The way I usually search for headlines is utilising outline path. If I want to find a book a captured in the past, I roughly follow outline structure. I typical search string is "dead read fant #good":

* No deadline
** Read
*** #good Pratchett [fantlab] Thief of Time

In the past, I used helm-org for searching. It is fairly fast except when helm-org rebuilds headline cache (every time I change anything in buffer), which usually takes 5-10 seconds and a lot of memory.

org-ql is generally quite a bit faster. At least it does not have noticeable delay when I modify my files. So, I tried to use it (and failed).

Issue description

Similar to my usual search, I tried to enter "olp:dead,read,fant #good" into helm-org-ql-agenda-files. However, all I got was empty helm buffer. Sometimes, the buffer was populated with matches to "olp" (when I was typing slowly), but that match was never updates with the new search string.

First, I thought that it is some kind of bug in helm. I have seen similar issues after running long Emacs session - Emacs gets sluggish and many helm commands show similar problems to the above. I even reported an issue to helm: https://github.com/emacs-helm/helm/issues/2359. Unfortunately, the behaviour is not reproducible with native helm commands. Though I can often reproduce it with helm-org-ql on my org files. So, it is not very helpful at the end. Some helm problem seems to be involved, but it is probably combined with something helm-org-ql code and my setup. Helm debugging showed that helm-org-ql source yields (nil) match all the time.

Another thing I found recently is when I just search in helm-org-ql without predicates: "dead read fant". I rarely do this since my outline path serves as search keywords (no olp: predicate yields very less results), but the helm buffer actually gets updated when I do this.

I tried to benchmark that underlying org-ql calls that are done from inside helm-org-ql for these two different kinds of match strings:

#+begin_src emacs-lisp
(use-package epdh :straight (epdh :host github :repo "alphapapa/emacs-package-dev-handbook"))
(bench-multi
  :forms (("no action, 799 matching elements"
       (org-ql-select #'org-agenda-files '(outline-path "read")))
      ("format heading, 799 matching elements"
           (org-ql-select #'org-agenda-files '(outline-path "read") :action `(helm-org-ql--heading 100)))
          ("no action, 2088 matching elements"
           (org-ql-select #'org-agenda-files '(outline-path "dead")))
      ("format heading, 2088 matching elements"
           (org-ql-select #'org-agenda-files '(outline-path "dead") :action `(helm-org-ql--heading 100)))))
#+end_src

+RESULTS[ee544c4d8c632dae8d930b1f511b1438fc8476d7]:

| Form | x faster than next | Total runtime | # of GCs | Total GC runtime | |----------------------------------------+--------------------+---------------+----------+------------------| | no action, 2088 matching elements | 7.23 | 0.433466 | 0 | 0 | | no action, 799 matching elements | 2.66 | 3.133674 | 0 | 0 | | format heading, 799 matching elements | 2.39 | 8.323075 | 0 | 0 | | format heading, 2088 matching elements | slowest | 19.891339 | 0 | 0 |

Possible cause and solution

I believe that the main problem is with helm-org-ql--heading function that is calculating outline path for all the matches (for all 2000!), which takes too much time. Then, helm seems to fail on functions that need so long time to populate the matches even if the search string would narrow the search later.

I tried to modify the helm-org-ql--heading code removing org-get-outline-path:

    (defun helm-org-ql--heading (window-width)
      "Return string for Helm for heading at point.
WINDOW-WIDTH should be the width of the Helm window."
      (font-lock-ensure (point-at-bol) (point-at-eol))
      (let* ((prefix (concat (org-entry-get (point) "CATEGORY") ":"))
             (heading (org-get-heading t)))
        (cons (concat prefix heading) (point-marker))))))

Now, the situation is much better. The buffer gets updated after a few seconds. It still takes significant time though! Moreover, most of that calculation does not even make much sense. Helm will only show 100 candidates -- there is no need to format anything beyond first helm-candidate-number-limit matches.

Probably, org-ql-select (or helm-org-ql source) can be modified to return first helm-candidate-number-limit matches instead of trying to process every possible candidate.

alphapapa commented 3 years ago

Hi,

There may be a few different issues at play here. However, please test this commit I just made which does a simple optimization to olp and olps queries: https://github.com/alphapapa/org-ql/commit/2d07cb082263061420c5a28a23bbd1d7f9008a25 In my testing they are much faster now.

Also, the next time this happens (or perhaps before testing that commit), please check to see if any of the buffers you're searching are not actually in org-mode. Sometimes that happens to me (e.g. when a file has local variables which require confirmation, the prompt for which is obscured by the Helm search), and that seems to cause a complete lack of results.

Moreover, most of that calculation does not even make much sense. Helm will only show 100 candidates -- there is no need to format anything beyond first helm-candidate-number-limit matches.

Probably, org-ql-select (or helm-org-ql source) can be modified to return first helm-candidate-number-limit matches instead of trying to process every possible candidate.

Probably the thing to do would be to move formatting to the Helm candidate transformer, or something like that. That would be a simple change.

Thanks.

yantar92 commented 3 years ago

There may be a few different issues at play here. However, please test this commit I just made which does a simple optimization to olp and olps queries: https://github.com/alphapapa/org-ql/commit/2d07cb082263061420c5a28a23bbd1d7f9008a25 In my testing they are much faster now.

I believe that it helps. I can see query like "olp:dead" (few thousands matches) almost instantly. It is on fresh Emacs session though. Let me see what happens after some time.

Also, the next time this happens (or perhaps before testing that commit), please check to see if any of the buffers you're searching are not actually in org-mode. Sometimes that happens to me (e.g. when a file has local variables which require confirmation, the prompt for which is obscured by the Helm search), and that seems to cause a complete lack of results.

Actually, I did test it (as you suggested earlier in reddit comment). I tried calling helm-org-ql inside "nodeadline.org" buffer instead of helm-org-ql-agenda-files. It made no difference.

Probably the thing to do would be to move formatting to the Helm candidate transformer, or something like that. That would be a simple change.

That makes sense. I tried to limit the number of candidates returned by org-ql-select, but it messed up sorting. The matches that should normally go to bottom were on top.

alphapapa commented 3 years ago

I believe that it helps. I can see query like "olp:dead" (few thousands matches) almost instantly. It is on fresh Emacs session though. Let me see what happens after some time.

A single-argument olp or olps query is now converted directly to a heading query, which is optimized to a whole-buffer regexp search, which is as fast as one can get in Emacs. Try it with multiple arguments, in which case the final one is converted to the heading predicate, and the others are then only tested when necessary. It should be many times faster now. In hindsight, it's an obvious optimization, but one which was easiest to implement with the new org-ql-defpred macro.

Also, the next time this happens (or perhaps before testing that commit), please check to see if any of the buffers you're searching are not actually in org-mode. Sometimes that happens to me (e.g. when a file has local variables which require confirmation, the prompt for which is obscured by the Helm search), and that seems to cause a complete lack of results.

Actually, I did test it (as you suggested earlier in reddit comment). I tried calling helm-org-ql inside "nodeadline.org" buffer instead of helm-org-ql-agenda-files. It made no difference.

Sorry, I wasn't clear. What I mean is, the next time you seem to get no results when you expect some, exit the search and check the major mode of all of the buffers you were searching. If any of them are not actually in org-mode, that could point to part of this problem.

Probably the thing to do would be to move formatting to the Helm candidate transformer, or something like that. That would be a simple change.

That makes sense. I tried to limit the number of candidates returned by org-ql-select, but it messed up sorting. The matches that should normally go to bottom were on top.

Yes, that would affect the sorting. I plan to enhance sorting with the ability to reverse the order of predicates, which should help with that.

alphapapa commented 3 years ago

For now, I'll use this issue to track using the candidate-transformer, but please continue to share what you find.

yantar92 commented 3 years ago

Probably the thing to do would be to move formatting to the Helm candidate transformer, or something like that. That would be a simple change.

Note: :candidate-transformer will still act on all the matches (even not displayed). :filtered-candidate-transformer is more suitable.

yantar92 commented 3 years ago

A single-argument olp or olps query is now converted directly to a heading query, which is optimized to a whole-buffer regexp search, which is as fast as one can get in Emacs.

That explains the speed)

Try it with multiple arguments, in which case the final one is converted to the heading predicate, and the others are then only tested when necessary. It should be many times faster now. In hindsight, it's an obvious optimization, but one which was easiest to implement with the new org-ql-defpred macro.

I am not sure if it is the correct behaviour. Consider the following example:

* No deadline
** Learn
*** Research
**** Plasticity
***** TODO Schwaiger [MRS-Fall] (2017) Characterizing the mechanical properties of individual phases in nanostructured composites

I may try to match the last heading like the following "olp:dead phase". It will not match.

Actually, I did test it (as you suggested earlier in reddit comment). I tried calling helm-org-ql inside "nodeadline.org" buffer instead of helm-org-ql-agenda-files. It made no difference.

Sorry, I wasn't clear. What I mean is, the next time you seem to get no results when you expect some, exit the search and check the major mode of all of the buffers you were searching. If any of them are not actually in org-mode, that could point to part of this problem.

I am not sure if I understand. What I described was:

I tried to run helm-org-ql-agenda-files, but got no reasults
Switched to "nodeadline.org" where the matches should be located
Tried to run helm-org-ql with the same search string, but still got no results

So, the issue should not be with buffers not in org mode.

Actually, I should not have any buffers in agenda files not in org mode. I always run agenda when I load Emacs, which turns on org-mode in all agenda buffers.

alphapapa commented 3 years ago

Thanks for catching that. I reverted that optimization for now.

Actually, I should not have any buffers in agenda files not in org mode. I always run agenda when I load Emacs, which turns on org-mode in all agenda buffers.

Okay, but it would be helpful if, the next time you encounter this bug, you would check list-buffers and verify that all of the buffers being searched are actually in Org mode, because it would allow me to eliminate that issue as a cause of your problem.

yantar92 commented 3 years ago

Okay, but it would be helpful if, the next time you encounter this bug, you would check list-buffers and verify that all of the buffers being searched are actually in Org mode, because it would allow me to eliminate that issue as a cause of your problem.

Ok. I guess it is better to double-check.

I have been trying to reproduce the issue again on the latest org-ql and Emacs session running for half a day. There is one new observation: when I input a search string yielding large number of matches (few thousands), the helm buffer remains empty for something like 20 seconds, but gets populated eventually.

However, this is on my new laptop with SSD, better CPU, and Emacs nativecomp branch - Emacs responsiveness is much better in comparison with my old laptop where I experienced the issue almost all the time. I suspect that in my earlier attempts I simply did not wait long enough for helm window to be populated. Probably, org-ql query + headline rendering took minutes.

alphapapa commented 3 years ago

Retargeting this for 0.7. 0.6 has been delayed for too long.

alphapapa / org-ql

Use filtered-candidate-transformer in helm-org-ql #160

+RESULTS[325b6b2c617360ce2b5ec38daf5c14cd015d00d3]:

+RESULTS[2a6e2a5100b074338f9def2ec844b1739735d61b]:

+RESULTS[ee544c4d8c632dae8d930b1f511b1438fc8476d7]: