However, we have a use case of checking core status on all data nodes (instead of a randomly picked node) periodically (instead of once only)
Therefore we will want such task be:
Repeatable
With concurrency/rate/duration configurations and retry/interruption mechanism like other rate controlled tasks
Able to issue GET requests to multiple nodes with optional node type control (data/os/qa etc), instead of 1 randomly picked node in the existing command task type
Collect stats based on http response code, report via the conventional xml result file
We will keep the existing "command" task type as is for backward compatibility while introducing a new task type org.apache.solr.benchmarks.task.UrlCommandTask that extends the org.apache.solr.benchmarks.task.AbstractTask. For our core status use case, the task type configuration looks like:
It retains the same url building logic as the existing "command" type (re-use the code with refactoring), but instead of resolving for a single url (of a random node), it resolves to a list of URL based on the cluster state and optional node-type param.
The new class also uses PrometheusExportManager.registerHistogram to report to grafana with command_url and http_status_code with metrics name solr_bench_url_command_duration_bucket
For example, on Grafana with this config:
histogram_quantile(.5, sum by (cluster, le, command_url, http_status_code) (rate(solr_bench_url_command_duration_bucket{}[$window] ) ) )
Description
There exists a "command" task type which executes a URL command.
However, we have a use case of checking core status on all data nodes (instead of a randomly picked node) periodically (instead of once only)
Therefore we will want such task be:
Solution
We will keep the existing "command" task type as is for backward compatibility while introducing a new task type
org.apache.solr.benchmarks.task.UrlCommandTask
that extends theorg.apache.solr.benchmarks.task.AbstractTask
. For our core status use case, the task type configuration looks like:It retains the same url building logic as the existing "command" type (re-use the code with refactoring), but instead of resolving for a single url (of a random node), it resolves to a list of URL based on the cluster state and optional
node-type
param.The new class also uses
PrometheusExportManager.registerHistogram
to report to grafana withcommand_url
andhttp_status_code
with metrics namesolr_bench_url_command_duration_bucket
For example, on Grafana with this config:
Produces real time graph :