datastax-labs / sperf

Datastax Enterprise and Apache Cassandra diagnostic tooling from DataStax
https://datastax-toolkit.github.io/sperf/
Apache License 2.0
17 stars 14 forks source link

sperf search filtercache does not work on byte related evictions #68

Closed tlasica closed 3 years ago

tlasica commented 3 years ago

Problem example:

⌁0% [tomek:~/dsp/21925] [sperf] $ ~/repos/sperf/scripts/sperf search filtercache
sperf filtercache: 0.6.13
from directory '/home/tomek/dsp/21925':
..
NOTE: as of version 0.3.13 all evictions with duration of 0 ms are deducted from the item eviction count and are not part of the eviction freq or duration calculations
start log: '2021-07-13 15:01:32,402000'
end log:   '2021-07-13 20:10:42,282000'
recommendations
---------------
No recommendations
node          Avg time between evictions  Avg eviction duration  Times (byte/item) limit reached  Most recent limit (byte/item)  Log duration
----          --------------------------  ---------------------  -------------------------------  -----------------------------  ------------
1n-100m-fc1m  0.00 ms                     0.00 ms                0/0                              0 byte/0                       5.15 hours
⌁0% [tomek:~/dsp/21925] [sperf] $ grep "eviction" nodes/1n-100m-fc1m/system.log | head -n 10
INFO  [LocalMessageServer query worker - 22] 2021-07-13 18:35:35,482  SolrFilterCache.java:363 - ...eviction completed in 127 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@315025cf usage is now 1072715764 bytes across 4634 entries.
INFO  [LocalMessageServer query worker - 23] 2021-07-13 18:36:10,696  SolrFilterCache.java:363 - ...eviction completed in 107 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@3a8a232 usage is now 1071117728 bytes across 4559 entries.
INFO  [LocalMessageServer query worker - 10] 2021-07-13 18:36:54,181  SolrFilterCache.java:363 - ...eviction completed in 100 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@34f0253 usage is now 1070254936 bytes across 4885 entries.
WARN  [pool-27-thread-1] 2021-07-13 18:37:01,894  SolrFilterCache.java:433 - There are too many filter cache evictions for solr core gatling.dsp4800. Change your queries to avoid too frequent filter cache evictions. e.g. pivot facet query or NOW in filter query boundary
INFO  [LocalMessageServer query worker - 24] 2021-07-13 18:37:30,975  SolrFilterCache.java:363 - ...eviction completed in 98 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@20ec7c15 usage is now 1064317272 bytes across 4164 entries.
INFO  [LocalMessageServer query worker - 15] 2021-07-13 18:38:11,569  SolrFilterCache.java:363 - ...eviction completed in 91 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@5933c22b usage is now 1066756504 bytes across 5042 entries.
INFO  [LocalMessageServer query worker - 25] 2021-07-13 18:38:57,270  SolrFilterCache.java:363 - ...eviction completed in 99 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@e68e45b usage is now 1065185160 bytes across 5257 entries.
INFO  [LocalMessageServer query worker - 14] 2021-07-13 18:39:41,067  SolrFilterCache.java:363 - ...eviction completed in 95 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@509f0b60 usage is now 1063079900 bytes across 4418 entries.
INFO  [LocalMessageServer query worker - 27] 2021-07-13 18:39:43,750  SolrFilterCache.java:363 - ...eviction completed in 403 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@4b55e071 usage is now 1067755932 bytes across 5698 entries.
INFO  [LocalMessageServer query worker - 31] 2021-07-13 18:39:43,751  SolrFilterCache.java:363 - ...eviction completed in 0 milliseconds. Filter cache org.apache.solr.search.SolrFilterCache$2@35e51296 usage is now 1066406136 bytes across 5718 entries.

Root cause analysis

The problem is caused by incorrect recognition of the events: eviction_items_duration and eviction_bytes_duration. First problem is that regexp for the eviction_bytes_duration is incorrect as it lacks usage_unit. After fixing it we have two identical rules and order of declaration takes precendence. As a result we have bytes_eviction events but items_duration events. In effect we have no information about duration and whole analysis return no value.

tlasica commented 3 years ago

I propose to fix it by:

rsds143 commented 3 years ago

nice catch and great work, not sure how this got broken as we have filter cache reports with byte evictions. Will take a look at this today.

rsds143 commented 3 years ago

fixed in #69