fullstorydev / solr-bench

Solr benchmarking and load testing harness
Apache License 2.0
17 stars 10 forks source link

Enhanced option for TaskType.command #102

Closed patsonluk closed 3 months ago

patsonluk commented 4 months ago

Description

There exists a "command" task type which executes a URL command.

However, we have a use case of checking core status on all data nodes (instead of a randomly picked node) periodically (instead of once only)

Therefore we will want such task be:

  1. Repeatable
  2. With concurrency/rate/duration configurations and retry/interruption mechanism like other rate controlled tasks
  3. Able to issue GET requests to multiple nodes with optional node type control (data/os/qa etc), instead of 1 randomly picked node in the existing command task type
  4. Collect stats based on http response code, report via the conventional xml result file
  5. Grafana support, report metrics dynamically with finer grain grouping (status code, endpoint url)

Solution

We will keep the existing "command" task type as is for backward compatibility while introducing a new task type org.apache.solr.benchmarks.task.UrlCommandTask that extends the org.apache.solr.benchmarks.task.AbstractTask. For our core status use case, the task type configuration looks like:

    "core-status": {
      "task-by-class": {
        "task-class": "org.apache.solr.benchmarks.task.UrlCommandTask",
        "name": "Periodically call core status",
        "params" : {
          "command" : "${SOLRURL}admin/cores?action=STATUS",
          "node-role": "DATA"
        },
        "min-threads": 1,
        "max-threads": 1,
        "rpm": 12,
        "duration-secs" : 3600
      }
    },

It retains the same url building logic as the existing "command" type (re-use the code with refactoring), but instead of resolving for a single url (of a random node), it resolves to a list of URL based on the cluster state and optional node-type param.

The new class also uses PrometheusExportManager.registerHistogram to report to grafana with command_url and http_status_code with metrics name solr_bench_url_command_duration_bucket

For example, on Grafana with this config:

histogram_quantile(.5, sum by (cluster, le, command_url, http_status_code) (rate(solr_bench_url_command_duration_bucket{}[$window] ) ) ) 

Produces real time graph : image

patsonluk commented 3 months ago

LGTM! Had a few questions I put in the review, but no major blockers for this to go in.

Thank you for the detailed review as always! 😊 Going to merge 🚀