clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Analysis of bottlenecks and performance tweaks #287

Open twagoo opened 4 years ago

twagoo commented 4 years ago

Identify the (major) bottlenecks in the processes involved in serving the VLO to the end user. This includes both the web app (Wicket application running in Tomcat) and the Solr back end. Certain actions seem to be slower than they need to be, in particular in relation to filtering based on facet values.

Potentially, optimisations could be made with respect to: 1) the (number of) requests made by the web app to Solr 2) handling of queries by the Solr server (e.g. choice of response handler type) 3) the structuring of data in the index in relation to (the most frequent) incoming requests (i.e. schema definition) 4) general configuration of the Solr server and index ...?

twagoo commented 4 years ago

One thing that we might want to look into again is #229, which caused a substantial increase of the average query response time since the deployment of VLO 4.6.0 (28 February 2019)

image

twagoo commented 4 years ago

There might be a lot of potential in improving Solr caching settings. In production, we could quite easily allocate 5-15GB of RAM for caching if deemed useful.

Solr config settings that we could look at:

Documentation: Solr 8.3: Query Settings in SolrConfig

A post with some useful hints: https://teaspoon-consulting.com/articles/solr-cache-tuning.html

twagoo commented 4 years ago

In case this is informative, here is a snapshot from the various cache metrics taken from the Solr dashboard at 2020-03-17 11:33 CET

solr-cache-stats.txt

For interpretation, see Performance Statistics Reference and this blog post. Hit ratio seems a good performance indicator.

teckart commented 4 years ago

I tried to replicate the user behaviour based on a week of Solr requests on the production machine, using an Apache JMeter test plan with 2 threads and varying Solr configurations (used cache implementation, cache sizes, eviction strategy). Focusing on the hitratios and number of cache inserts/evictions it was no problem to replicate the good results of the query result cache and the filter cache. However, I couldn't replicate the very low hitratio of the production's document cache (where 96% of lookups are misses), even though our Solr instance just uses an LRU based eviction approach.

The frequency distribution of queried documents shows the expected long-tail of documents that are fetched only rarely (35% of all documents 2 times at most during this week) but not an extreme form of a power law distribution. Assuming the document cache holding the most popular documents, its current size would only account for 35% of all document queries, a cache size of 4096 would increase this value to 56%. Given the rather heterogenous document queries on the VLO, there is probably not a lot to do beyond this (and these queries seem to be fast anyway).

The moderate hitratio of the query result cache is a bit unclear. As the vast majority of VLO page views are the main page (with or without user selection), the four (by far) most frequent Solr queries are:

  1. number of records (w/ duplicates)
  2. summary of search facets (w/o duplicates)
  3. first 10 results (w/o duplicates)
  4. number of records (w/o duplicates) (RH: /fast)

It might be possible to avoid the last query (being redundant considering the third), however, their high frequencies should make them always be part of the 512 stored query results in our standard configuration. An increase of this cache's size (also 4096?) might still have positive results and reduced the number of cache evictions in the tests.

A closer look on the query times shows that around 90% of all distinct Solr queries (for the most relevant /select request handler) return after 200ms or less in average. As might be expected, their is a clear separation between fast and slow queries based on use of faceting (average qtime with/without faceting: 1545ms vs. 51ms). In fact, 86% of the 1000 slowest distinct queries involve a restriction on the languageCode facet. As there are hardly any faceted queries with a qtime of less than 1000 (~3%), it seems that most of these queries are cache misses. Some tests proved this: reloads of the main page (without further restrictions or with only a fulltext search) trigger the aforementioned 4 queries, of which for 3 the query times are reduced significantly after the first reload (i.e. results are cache hits); whereas query no. 2's qtime does not improve. When selecting an additional facet-based restriction (like "languageCode:(code:deu)") only query no. 4's performance improves.

However, when evaluating Solr's cache statistics for every page view, it becomes clear that all 4 queries are counted as cache hits on the query result cache (except for the first page view on a cold cache). Therefore, it seems that the sub-optimal qtimes are already cache-based values. tbc.

twagoo commented 4 years ago

Thanks @teckart for this thorough report. Some thoughts/comments that came up while reading:

I couldn't replicate the very low hitratio of the production's document cache (where 96% of lookups are misses), even though our Solr instance just uses an LRU based eviction approach.

Could this low hit ratio be an artefact of the import process? Import will make individual queries for all documents. Or have you included this in your simulations as well?

As might be expected, their is a clear separation between fast and slow queries based on use of faceting (average qtime with/without faceting: 1545ms vs. 51ms).

By this do you mean with/without production of the facet result, or with/without a facet selection? The former makes sense but the latter would surprise me. Either way, I have to say that this different is very large!

reloads of the main page (without further restrictions or with only a fulltext search) trigger the aforementioned 4 queries, of which for 3 the query times are reduced significantly after the first reload (i.e. results are cache hits); whereas query no. 2's qtime does not improve.

Strange, does this mean that facets are not cached?

In any case, if facets are such a big factor when it comes to performance, I would like to see the effect of setting the facet.limit parameter more intelligently. At the moment, I believe that we always set it to -1, i.e. return all many thousands of facet values. Possibly just too much to cache?

teckart commented 4 years ago

Could this low hit ratio be an artefact of the import process? Import will make individual queries for all documents. Or have you included this in your simulations as well?

You are right about that: after several import rounds I ended up with similar hitratios for the document cache.

Strange, does this mean that facets are not cached?

All queries that I talked about, are counted as hits on the queryResultCache. Even those with absurd high query times.

After a myriad of test runs and configuration changes, the following minimal example using the requestHandler /select shows the effect. The default collapsing on _signature is disabled here; the actual Solr output is only the (correct) number of documents (all/collapsed); QTimes are average numbers for 3 calls after one "warm-up" call.

All: select?q=*:*&rows=0&facet=false --> QTime: 0 select?q=*:*&rows=0&facet=true --> QTime: 0

Collapsed: select?q=*:*&rows=0&facet=false&fq={!collapse+field%3D_signature} --> QTime: 0 select?q=*:*&rows=0&facet=true&fq={!collapse+field%3D_signature} --> QTime: 430

This fits to my previous comment so far as the main page without selection only uses a combination of facet=true and collapsing for a single query (the facet summary on the left panel), but for three queries when a facet-based restriction is made. Still unclear to me why the combination of both is counted as a cache hit, without obvious improvements in query time.

twagoo commented 4 years ago

Hmm, so the bottleneck really is in the interaction between faceting and collapsing. Have we looked at the notes on performance in the following pages and/or @teckart do you know if they can somehow be applied to our solution?

To be honest, I'm not sure if we are already (implicitly) using the CollapsingQParserPlugin

teckart commented 4 years ago

@teckart do you know if they can somehow be applied to our solution?

The only remark about performance for the collapsing query parser concerns the hint parameter for using the field cache. This seemed to be a bit of an improvement, but the averaged effect was really minimal and I can't rule out that it was just an measurement error. I will look on the second link, but IIRC I took this already into account when the collapse feature was introduced.

To be honest, I'm not sure if we are already (implicitly) using the CollapsingQParserPlugin

I also had some doubts at first and there is not a lot of documentation about it. As far as I understand, its a plugin that is used automatically when you use queries with the collapse keyword (in contrast to group via group=true). I couldn't find any reference to schema or configuration changes in discussions about this filter either; only details concerning the query syntax.

teckart commented 4 years ago

https://blog.trifork.com/2009/10/20/result-grouping-field-collapsing-with-solr/

For completeness: this post refers to an inofficial grouping implementation before Solr got the feature with version 3.x (group=true). The hints regarding performance seem to be strongly related to this particular implementation and configuration options (like collapse.type)