Closed jlegind closed 6 months ago
The fix suggested by @jlegind did not lead to noticeable improvement in performance.
I will attempt to install a module called FTS5 into the sqlitefile that has better prospects.
Apparently, FTS5 is already part of sqlite out of the box.
I followed the instructions and created a virtual table referring to the original taxonname table and the performance is drastically improved:
SELECT * FROM taxonname WHERE fullname LIKE '%Draba%'; -- Result: 4435 rows returned in 592ms
SELECT * FROM taxonname_fts WHERE fullname MATCH 'Draba'; -- Result: 4294 rows returned in 10ms
SELECT * FROM taxonname WHERE id IN (
SELECT id FROM taxonname_fts WHERE fullname MATCH 'Draba'); -- Result: 4294 rows returned in 38ms
There are differences in the number of rows returned, because FTS tokenizes individual words instead of partial entries. This means for the above case it will skip entries like "Xerodraba". However, it is possible to use wild cards with no significant decrease in performance:
SELECT * FROM taxonname WHERE id IN (
SELECT id FROM taxonname_fts WHERE fullname MATCH 'Draba*'); -- Result: 4304 rows returned in 49ms
I can try and tweak the wild cards some more...
The FTS improvement looks fantastic. BTW, Do we have an index on the taxon fullname column?
@jlegind Yes, I already tried with index on that column, but with little result. The FTS option really does the trick, though results aren't identical.
I found that performance is better with the number of keystrokes triggering the query being 4 instead of the original 3.
However, I'd need @bhsi-snm or @PipBrewer to sanction this change if I am to proceed with it. Otherwise I'll revert to 3 again.
Never mind: I found that adding a limit of 200 to the query dramatically improves performance and 3 characters is fine.
Describe the bug I think the auto suggest for taxonomy is borderline slow. Though for nearly 2.5 million rows it is still pretty decent.
To Reproduce Input a taxon name having several versions with different authors, like: "Abacopteris lineata"
Additional context A quick fix could be to only allow wildcard search from the end of the inputted keystrokes:
I realize that the auto-suggest from anywhere in the string became a requirement, I still submit this as a fix. It is a quick roll back if there is dissatisfaction from the digitizer staff. We should set index for the taxonname table to speed up queries. This web page has some good suggestions: https://developer.android.com/topic/performance/sqlite-performance-best-practices