Closed redrun45 closed 3 months ago
I'll throw in a "works on my machine" for the advanced search (quick search is broken, but that's expected as it hasn't been updated with the window functions yet). It seems to be correct in that it still returns the same results with the same pagination.
As far as performance goes, on my machine, in a very simple test, it does seem to work faster. ab
is what I had handy, so I used that against my local install:
ab -n 100 --verbose 'https://librivox.org/advanced_search?title=wells&author=&reader=&keywords=&genre_id=0&status=all&project_type=either&recorded_language=&sort_order=catalog_date&search_page=1&search_form=advanced&q='
New query:
Connection Times (ms)
min mean[+/-sd] median max
Connect: 5 5 0.8 5 12
Processing: 73 76 2.1 75 86
Waiting: 73 76 2.1 75 86
Total: 78 81 2.5 80 98
Percentage of the requests served within a certain time (ms)
50% 80
66% 81
75% 81
80% 82
90% 83
95% 86
98% 89
99% 98
100% 98 (longest request)
Old query:
Connection Times (ms)
min mean[+/-sd] median max
Connect: 5 5 0.4 5 8
Processing: 136 146 13.9 141 208
Waiting: 136 146 13.9 141 208
Total: 141 152 14.0 146 214
Percentage of the requests served within a certain time (ms)
50% 146
66% 151
75% 155
80% 157
90% 163
95% 171
98% 214
99% 214
100% 214 (longest request)
I tried a few other searches, and the results were pretty much the same - the old one doing two searches took about twice as long as the new one doing one search.
I have no idea how to check the cost of a query on MariaDB à la EXPLAIN
on Postgres, googling around it doesn't seem like a doable thing? Either way, that we're getting faster results implies we're probably tying up fewer DB resources.
I had a play around adding different indexes to the search_table
table, but I only ended up making performance worse, and getting a p99 of 98ms is already going to be pretty hard to beat I think.
I'm sure there are many other things wrong with search that we can improve and fix and change, but as small incremental improvements go, this one is pretty good. @redrun45 you wrote "[Sanity check]", are you opposed to merging this?
@notartom if the technical direction is sound, then no, I'm not opposed. I just have to patch the simplesearch part too. I'll get a second commit ready soon!
Not urgent, but if I'm on the right track, we can improve search performance by a fair bit.
It looks like controllers/catalog/Search.php is running each search query twice: once for the items on the current page, and then again to count the total number of matches for the search. There doesn't seem to be a clean way to count the total results, but here's one way.
We can't use a regular
COUNT(*)
, because that will stop at ourLIMIT
of 25. So: theOVER()
call (1, 2) lets usCOUNT
before we discard the other results. But: doing this on each of ourUNION
sets will give different results, which we'd need to add together selectively. Collecting them all in a sub-query (WITH results AS (
) gives us a combined set to count.I don't know how much of a hack this is, and how much overhead it adds, but it at least seems much faster than running the entire search query twice.