d12frosted / vulpea

A collection of functions for note taking based on `org` and `org-roam`.
GNU General Public License v3.0
237 stars 12 forks source link

(feat) use view table for query operations #116

Closed d12frosted closed 2 years ago

d12frosted commented 2 years ago

Release plan:

  1. [x] Finish benchmarks.
  2. [x] Document this change.
  3. [x] Use it privately for some time.
  4. [x] If everything is fine, merge to master.

After some time (e.g. maybe one week or more), I release new version of vulpea.


I am still not happy with performance query functions on a set of 9k+ notes. Partially, the culprit here is that horrific SQL. The more tables I want to multiply, the worse performance becomes.

So idea here is simple. Instead of doing multiplication of matrices during query operation, use a view table where every row is a full vulpea-note (in reality multiple notes because of aliases, but that's irrelevant here). This new view table is used in the following functions:

This transitively affects all specialized queries.

The way view table is being built is not optimal as it parses the buffer for the second time in addition to org-roam routine. Ideally this feature should be proposed to org-roam thus improving build time.

Benchamarks

Count of notes: 9554.

General benchmark with regular org-roam tables

test result size generic specialized
tags-some 30 notes 4.6693460650999995 0.0605194177
tags-every 3168 notes 4.7333844436999996 1.0618131127
links-some 1657 notes 4.8095771283 1.362214091
links-every 92 notes 4.5517473337999995 0.1707312557

General benchmark with view table

test result size generic specialized
tags-some 30 notes 1.0112478712 0.0066033426
tags-every= 3168 notes 1.0059819176 0.5709392964999999
links-some 1657 notes 1.0462236128999999 0.4248580532
links-every 92 notes 1.0204833089 0.0545313596

Comparison of vulpea-db-query

test result size regular view table ratio
tags-some 30 notes 4.6693460650999995 1.0112478712 4.6174100
tags-every 3168 notes 4.7333844436999996 1.0059819176 4.7052381
links-some 1657 notes 4.8095771283 1.0462236128999999 4.5970833
links-every 92 notes 4.5517473337999995 1.0204833089 4.4603839

Since vulpea-db-query loads everything into memory, no surprise that improvement is pretty much the same across test scenarios. The good part here is that on average new implementation performs 4.595 times faster. And since it stays around 1 second for 9554 notes it makes it quite practical.

Comparison of specialized queries

test result size regular view table ratio
tags-some 30 notes 0.0605194177 0.0066033426 9.1649671
tags-every 3168 notes 1.0618131127 0.5709392964999999 1.8597653
links-some 1657 notes 1.362214091 0.4248580532 3.2062805
links-every 92 notes 0.1707312557 0.0545313596 3.1308821

Since specialized query performance depends on result size (e.g. how many notes it needs to load into memory) the improvement is not stable. But even for big slices it outperforms previous implementation. On average, new implementation performs 4.34 times faster. Not really useful metric in this case, but still.

Comparison of db sync

Now the real question is how synchronisation is affected. We have the following test scenarios.

  1. Sync one small sized note.
  2. Sync one medium sized note.
  3. Sync one big sized note.
  4. Sync vulpea-test-notes (e.g. 9554 relatively small sized notes).
test regular view table diff ratio
vulpea-test-notes 172.79389154999998 337.61603822 164.82215 1.9538656
small 0.000354079889 0.000416262194 6.2182305e-5 1.1756166
medium 0.000492199416 0.000539389997 4.7190581e-5 1.0958770
huge 0.1732851848 0.2240508243 0.050765640 1.2929601

As you can see, it affects dramatically full rebuild, but when it comes to singular updates, the difference is pretty small.

Conclusion

View table improves query performance dramatically. Since most of the time I am using query functions and reading notes, this small difference on syncing small to medium files is not critical for release of view tables. Though full rebuild penalty is quite big and should improved in the future iterations.

P. S. I am leaving the full sync test case disabled because I don't want to add 8.5 minutes to the tests. If needed, just replace xit with it.

d12frosted commented 2 years ago

update: moved release plan to the first message

d12frosted commented 2 years ago

I am already using this branch and must say... it feels so snappy! Absolutely love it!

d12frosted commented 2 years ago

I will merge it today closer to evening.

publicimageltd commented 2 years ago

Wow, these performance differences are so amazing! That must be added to org-roam, please! (And if I have an extra wish, make it accessible with an API so that Delve does not have to replicate the queries...)

Do I understand correctly that this view table also contains the links?

d12frosted commented 2 years ago

That must be added to org-roam, please!

I would love to work on that actually. First I wanted to investigate this idea for some time to see if it works. And then ask @jethrokuan if he is willing to accept something like this. But since this conversation started to happen, then... @jethrokuan, what do you think? :)

Do I understand correctly that this view table also contains the links?

Yes. Basically it started from #106. Plus I wanted to provide similar functionality (e.g. to search for links with N depth) for https://github.com/d12frosted/vino library. For now you can use vulpea-db-query-by-links-some and vulpea-db-query-by-links-every - it allows to search for depth = 1. I want to spend some time to optimize more deep search.

d12frosted commented 2 years ago

BTW, inclusion to org-roam would easily solve write performance.