d12frosted commented 2 years ago

Release plan:

[x] Finish benchmarks.
[x] Document this change.
[x] Use it privately for some time.
[x] If everything is fine, merge to master.

After some time (e.g. maybe one week or more), I release new version of vulpea.

I am still not happy with performance query functions on a set of 9k+ notes. Partially, the culprit here is that horrific SQL. The more tables I want to multiply, the worse performance becomes.

So idea here is simple. Instead of doing multiplication of matrices during query operation, use a view table where every row is a full vulpea-note (in reality multiple notes because of aliases, but that's irrelevant here). This new view table is used in the following functions:

vulpea-db-query
vulpea-db-query-by-ids
vulpea-db-get-by-id

This transitively affects all specialized queries.

The way view table is being built is not optimal as it parses the buffer for the second time in addition to org-roam routine. Ideally this feature should be proposed to org-roam thus improving build time.

Benchamarks

Count of notes: 9554.

General benchmark with regular `org-roam` tables

test	result size	generic	specialized
`tags-some`	30 notes	4.6693460650999995	0.0605194177
`tags-every`	3168 notes	4.7333844436999996	1.0618131127
`links-some`	1657 notes	4.8095771283	1.362214091
`links-every`	92 notes	4.5517473337999995	0.1707312557

General benchmark with view table

test	result size	generic	specialized
`tags-some`	30 notes	1.0112478712	0.0066033426
`tags-every=`	3168 notes	1.0059819176	0.5709392964999999
`links-some`	1657 notes	1.0462236128999999	0.4248580532
`links-every`	92 notes	1.0204833089	0.0545313596

Comparison of `vulpea-db-query`

test	result size	regular	view table	ratio
`tags-some`	30 notes	4.6693460650999995	1.0112478712	4.6174100
`tags-every`	3168 notes	4.7333844436999996	1.0059819176	4.7052381
`links-some`	1657 notes	4.8095771283	1.0462236128999999	4.5970833
`links-every`	92 notes	4.5517473337999995	1.0204833089	4.4603839

Since vulpea-db-query loads everything into memory, no surprise that improvement is pretty much the same across test scenarios. The good part here is that on average new implementation performs 4.595 times faster. And since it stays around 1 second for 9554 notes it makes it quite practical.

Comparison of specialized queries

test	result size	regular	view table	ratio
`tags-some`	30 notes	0.0605194177	0.0066033426	9.1649671
`tags-every`	3168 notes	1.0618131127	0.5709392964999999	1.8597653
`links-some`	1657 notes	1.362214091	0.4248580532	3.2062805
`links-every`	92 notes	0.1707312557	0.0545313596	3.1308821

Since specialized query performance depends on result size (e.g. how many notes it needs to load into memory) the improvement is not stable. But even for big slices it outperforms previous implementation. On average, new implementation performs 4.34 times faster. Not really useful metric in this case, but still.

Comparison of db sync

Now the real question is how synchronisation is affected. We have the following test scenarios.

Sync one small sized note.
Sync one medium sized note.
Sync one big sized note.
Sync vulpea-test-notes (e.g. 9554 relatively small sized notes).

test	regular	view table	diff	ratio
`vulpea-test-notes`	172.79389154999998	337.61603822	164.82215	1.9538656
`small`	0.000354079889	0.000416262194	6.2182305e-5	1.1756166
`medium`	0.000492199416	0.000539389997	4.7190581e-5	1.0958770
`huge`	0.1732851848	0.2240508243	0.050765640	1.2929601

As you can see, it affects dramatically full rebuild, but when it comes to singular updates, the difference is pretty small.

Conclusion

View table improves query performance dramatically. Since most of the time I am using query functions and reading notes, this small difference on syncing small to medium files is not critical for release of view tables. Though full rebuild penalty is quite big and should improved in the future iterations.

P. S. I am leaving the full sync test case disabled because I don't want to add 8.5 minutes to the tests. If needed, just replace xit with it.

d12frosted commented 2 years ago

update: moved release plan to the first message

d12frosted commented 2 years ago

I am already using this branch and must say... it feels so snappy! Absolutely love it!

d12frosted commented 2 years ago

I will merge it today closer to evening.

publicimageltd commented 2 years ago

Wow, these performance differences are so amazing! That must be added to org-roam, please! (And if I have an extra wish, make it accessible with an API so that Delve does not have to replicate the queries...)

Do I understand correctly that this view table also contains the links?

d12frosted commented 2 years ago

That must be added to org-roam, please!

I would love to work on that actually. First I wanted to investigate this idea for some time to see if it works. And then ask @jethrokuan if he is willing to accept something like this. But since this conversation started to happen, then... @jethrokuan, what do you think? :)

Do I understand correctly that this view table also contains the links?

Yes. Basically it started from #106. Plus I wanted to provide similar functionality (e.g. to search for links with N depth) for https://github.com/d12frosted/vino library. For now you can use vulpea-db-query-by-links-some and vulpea-db-query-by-links-every - it allows to search for depth = 1. I want to spend some time to optimize more deep search.

d12frosted commented 2 years ago

BTW, inclusion to org-roam would easily solve write performance.

d12frosted / vulpea

(feat) use view table for query operations #116

Benchamarks

General benchmark with regular `org-roam` tables

General benchmark with view table

Comparison of `vulpea-db-query`

Comparison of specialized queries

Comparison of db sync

Conclusion

d12frosted / vulpea

(feat) use view table for query operations #116

Benchamarks

General benchmark with regular org-roam tables

General benchmark with view table

Comparison of vulpea-db-query

Comparison of specialized queries

Comparison of db sync

Conclusion

General benchmark with regular `org-roam` tables

Comparison of `vulpea-db-query`