Open Kungsgeten opened 4 years ago
Hello, I am not the author of this package, but it looks like this issue is about extending org-ql--pre-process-query
. It is not really specific to org-brain, so I am now interested in this thread.
Alternatively, you might be able to improve the performance using memoization.
@akirak Thanks! I hadn't read about memoize before. The following changes made it run much faster:
(defun org-brain-entry-from-name (entry-name)
"Return entry with ENTRY-NAME."
(if-let ((entry (assoc entry-name
(mapcan #'org-brain--file-targets (org-brain-files)))))
(or (org-brain-entry-from-id (cdr entry))
(cdr entry))
(error "No entry found with name %s" entry-name)))
(memoize 'org-brain-entry-from-name 10)
(org-ql--defpred ob-child-of (entry-name)
"Return non-nil if current heading is an org-brain child of ENTRY-NAME."
(member (org-brain-entry-at-pt)
(org-brain-children (org-brain-entry-from-name entry-name))))
Hi Erik,
Aha, welcome to the (or light dark)
side! :) Haha, seriously though, I'm glad that org-ql
is useful in org-brain
.
Regarding the idea of binding data around a query: that's an idea I've thought about, although I can't seem to find any notes I've made about it. Generally, I think it would be helpful for data that is specific to each buffer searched in a query, like per-buffer to-do keywords. Caching data like that would avoid having to retrieve it every time the predicate is called.
For your specific use case, I think it's not necessary. I looked at your code and I think you can easily optimize it with a few changes. Here are some thoughts:
You shouldn't need to define a new predicate--and in fact, doing so is suboptimal, because the defpred
macro doesn't define a "preamble" regexp, which prevents the biggest optimization that org-ql
can do. (Ideally the macro could also be used to define a "preamble", but I think that wouldn't be feasible, because it would require refactoring how the --query-preamble
function works, which would raise other issues. Let's say that that's a long-term idea that might be possible someday.)
Instead, your code should define queries that use existing predicates as much as possible, which org-ql
can optimize better.
For example, instead of defining this predicate:
(org-ql--defpred org-brain ()
"Return non-nil if entry could be a part of `org-brain'.
Entry should have an ID and `org-brain-entry-at-point-excludedp' should return nil."
(and (org-entry-get (point) "ID")
(not (org-brain-entry-at-point-excludedp))))
Which you then use in org-brain-headline-entries
like:
(defun org-brain-headline-entries ()
"Get all org-brain headline entries."
(org-ql-select (org-brain-files) '(org-brain)
:action #'org-brain--headline-entry-at-point))
Use existing predicates in a query, like:
(defun org-brain-headline-entries ()
"Get all org-brain headline entries."
(org-ql-select (org-brain-files)
'(and (property "ID")
(not (org-brain-entry-at-point-excludedp)))
:action #'org-brain--headline-entry-at-point))
That should be faster, because the property
predicate can be optimized to a whole-buffer regexp search. Of course, it depends on how many entries in a file have an ID
property: if only some of them do, it will skip checking ones that don't; but if all of them do, it will have to check them all anyway. If it happened that org-brain IDs had a certain structure or prefix, you could write a query to jump directly to potential org-brain IDs, e.g. something like:
(and (regexp "^:ID: org-brain-")
(property "ID") ; Ensure the match is actually a property in a drawer
(not (org-brain-entry-at-point-excludedp)))
Also, when possible, prefer to do as much work as possible in the :action
function, rather than doing, e.g. mapcar
across a list of results like you do here: https://github.com/Kungsgeten/org-brain/commit/31603865b4164b9f218cf2cb091cdb32e0d20aa2#diff-60847823822d6fb59f29ef038133d89fR701 In that example, you could do the work of the mapcar
's lambda in the action function, which avoids consing another list.
org-ql--value-at
when appropriate, which provides per-node caching. For example, you might benefit from calling org-brain--headline-entry-at-point
through it, and likely org-brain-entry-at-point-excludedp
as well (assuming they tend to be called repeatedly for a headline before the buffer changes).Do you understand what I mean? Let me know if I should explain more or more clearly. :)
BTW, regarding memoize
: I can't seem to find the relevant discussions right now, but Chris has said that he generally favors using purpose-built caching functions over memoize
nowadays, because they can be much faster. That's why I wrote org-ql--value-at
instead of using memoize
. That's not to say that you shouldn't use memoize
, but it's something to be aware of.
Also, @akirak: Yes, ideally --pre-process-query
could be used to, e.g. replace (org-brain)
with one of the query examples I gave. However, that function isn't designed to be extensible like that. I've thought about it, but it would be difficult, especially since the order of the clauses matters. I could imagine e.g. a defvar
with a long list of lambdas which are applied in order, but that would also make debugging more difficult because the code would be spread out among lambdas stored in a variable, defined in macro calls spread out in the file, instead of being together in a single function. So there are tradeoffs to both designs. In the long term, I'd like to have a more flexible design, but it needs to be carefully done.
@alphapapa Thank you for the comment. I will keep it in mind.
Also, I liked memoization because memoize.el is so easy to use and convenient, but I didn't know of that discussion. Thank you for the info.
Ah, here's the note I was thinking of: https://github.com/alphapapa/org-ql/blob/d28d54c58842f0d31e40e8d21337360c9c238ba0/helm-org-ql.el#L212
I've been sitting for some hours now trying different ways to make org-brain-headline-entries
faster, using org-ql
. The first time the function is run, it takes quite a lot of time, and then it is very fast (thanks to the caching, I guess).
What I realized though was that it didn't matter much what I put into the query or action, it took about the same time. The most stripped down version I've tried is this:
(defun org-brain-headline-entries ()
"Get all org-brain headline entries."
(org-ql-select (org-brain-files)
'(property "ID")
:action '(identity 1)))
This takes about 20 seconds the first time it is run. It scans 25 files and finds 224 matches. Is that the expected speed? I have only tried it on my Windows config. During the first run I get lots of messages in the mini-buffer:
=> org-brain-headline-entries
Org mode fontification error in #<buffer clojure.org> at 122
Setting up indent for shell type sh
Indentation variables are now local.
Indentation setup for shell type sh
Org mode fontification error in #<buffer emacs.org> at 171
Org mode fontification error in #<buffer emacs.org> at 248
Org mode fontification error in #<buffer funcs.org> at 4
Org mode fontification error in #<buffer programmering.org> at 227
Org mode fontification error in #<buffer programmering.org> at 247
Org mode fontification error in #<buffer programmering.org> at 287
Showing all blocks ... done
Org mode fontification error in #<buffer programmering.org> at 305
Showing all blocks ... done
Org mode fontification error in #<buffer programmering.org> at 318
Showing all blocks ... done
Org mode fontification error in #<buffer programmering.org> at 324
Showing all blocks ... done
Org mode fontification error in #<buffer programmering.org> at 336
Showing all blocks ... done
Org mode fontification error in #<buffer programmering.org> at 363
Org mode fontification error in #<buffer python.org> at 55
Can’t guess python-indent-offset, using defaults: 4
Org mode fontification error in #<buffer python.org> at 92
org-ql: No headings in buffer: clojure.org.org
org-ql: No headings in buffer: ekonomi.org.org
org-ql: No headings in buffer: funcs.org
org-ql: No headings in buffer: kanske.org
org-ql: No headings in buffer: nytt.org
org-ql: No headings in buffer: r3.org
No, 25 files and 224 matches should be collected much faster than that. That fontification error appears to come from org-fontify-meta-lines-and-blocks
, and I'm guessing Emacs is pausing for each error message, which is artificially inflating the runtime (I see a similar behavior when I'm running an Org Agenda command and an invalid diary sexp causes errors/warnings, which Emacs pauses to display repeatedly while I'm waiting for the agenda to appear...grrr).
It's hard to say what is causing the fontification errors. Once in a while I see those errors in my own config when doing random Org-related things. You might have some Org files with invalid syntax, so you might try using org-lint
on them. If that doesn't help, try to reproduce them with emacs -q
, to ensure the problem isn't something in your config. (Or you could use https://github.com/alphapapa/emacs-sandbox.sh instead of emacs -q
, so you wouldn't have to set up the package system yourself.)
I deactivated a package named org-variable-pitch
, which got rid of the fontification errors. That got me down to 13 seconds. I also tried to remove the files which org-ql
complained about (almost empty files). That got me down to 10 seconds. I then tried the following change to org-brain-headline-entries
:
(defun org-brain-headline-entries ()
"Get all org-brain headline entries."
(org-ql-select (mapcar (lambda (x)
(with-current-buffer (create-file-buffer x)
(insert-file-contents x)
(delay-mode-hooks (org-mode))
(current-buffer)))
(org-brain-files))
'(property "ID")
:action '(identity 1)))
This gets me down to 0.13 seconds! I've had similar problems during the development of org-brain
, that using find-file-noselect
is slow. In order for caching to work, and to not open endless buffers, I also tried this, which seems to work well:
(defun org-brain-headline-entries ()
"Get all org-brain headline entries."
(org-ql-select (mapcar (lambda (x)
(or (find-buffer-visiting x)
(with-current-buffer (create-file-buffer x)
(insert-file-contents x)
(delay-mode-hooks (org-mode))
(setq buffer-file-name x)
(not-modified)
(current-buffer))))
(org-brain-files))
'(property "ID")
:action '(identity 1)))
The downside here is ofcourse that the mode-hooks aren't run. If I enable them I get about 1.9 seconds on the first call.
Yeah, it sounds like your config has Org doing a lot of initialization in the mode hooks. Even some built-in Org modules can be slow to initialize in large files, like org-indent-mode
. So if those files aren't already open in buffers, org-ql
will open them to search them, and that will delay the search. So what you're measuring here is not just org-ql
's performance but how fast your Org configuration initializes an Org buffer, which also depends on the filesize, number of headings, etc.
So I'd recommend testing by writing a script to do something like this:
(mapc #'find-file-noselect (org-brain-files))
. I think find-file-noselect
won't return until the mode hooks are done, but you should verify that.(org-ql-select ...
.Also, I recommend using this macro for your benchmarking: https://github.com/alphapapa/emacs-package-dev-handbook#bench-multi-lexical It lets you compare forms, and it uses lexical-binding, which can make a big difference (and you always want your "production" code to use lexical-binding anyway).
There's also a lot of stuff in find-file-hook
etc. Thank you for the tip regarding your development guide. A small question regarding measuring performance: How do you remove the org-ql
cache? At the moment I'm restarting Emacs each time I want to check the speed.
The 3 caches are initialized here: https://github.com/alphapapa/org-ql/blob/fb830f8577bb6287d899948f696e33c96e877a40/org-ql.el#L85 You can reset them by re-evaluating those defvar
forms (make sure to do all 3), or the equivalent.
Happy new year! I've experimented with adding
org-ql
toorg-brain
, and it has been a huge performance boost. See commit here (if you're interested): https://github.com/Kungsgeten/org-brain/commit/31603865b4164b9f218cf2cb091cdb32e0d20aa2I had an idea to provide
org-ql
predicates for some oforg-brain
specific things, in case users want to query theirorg-brain
content. Here's an example:This works, but is slow since the
let
-clause has to be run for every headline in the query. In reality thelet
-clause is only needed to be run once, before the search starts, and then the results could be used in each try.It would be nice to have a way to bind data before the search starts, and have that data be available in the
defpred
. Perhaps something like this:It could be though that this won't be needed in any other use case, and thus doesn't fit into
org-ql
. Another posibility could be for theBUFFERS-OR-FILES
argument inorg-ql-select
to also allow a list of markers to headlines.