Export benchmark information as line protocol

alamb commented 1 year ago

Is your feature request related to a problem or challenge?

We want to have information about DataFusion's performance over time -- https://github.com/apache/arrow-datafusion/issues/5504. This is becoming more important as we work on more performance items / optimizations such as https://github.com/apache/arrow-datafusion/pull/5904

Currently the datafusion benchmarks in https://github.com/apache/arrow-datafusion/tree/main/benchmarks#datafusion-benchmarks can output the runs results as a JSON file.

I would like to use existing visualization systems (like timeseries databases).

Describe the solution you'd like

I would like to output the benchmark data optionally as LineProtocol https://docs.influxdata.com/influxdb/cloud-iox/reference/syntax/line-protocol/ so that it can be visualized by grafana / other systems that can handle line protocol

See https://grafana.com/docs/grafana-cloud/data-configuration/metrics/metrics-influxdb/push-from-telegraf/

Proposed Design

Write a python script, modeled after compare.py, that takes a performance json file and produces as output lineprotocol

Desired output measurement: benchmark tags: details from run fields: query iteration, row_count, elapsed_ms timestamp: ns since epoch (I think that means multiply by 1000, but maybe by 1,000,000)

Example output A line like this for each element in the queries array:

benchmark,name=sort,"--scale_factor"="1.0","datafusion_version"="31.0.0",num_cpus="8 query="sort utf8",iteration=1,row_count=10838832 1694704746000

Example input:

{
  "context": {
    "arguments": [
      "sort",
      "--path",
      "/home/alamb/arrow-datafusion/benchmarks/data",
      "--scale-factor",
      "1.0",
      "--iterations",
      "5",
      "-o",
      "/home/alamb/arrow-datafusion/benchmarks/results/main_base/sort.json"
    ],
    "benchmark_version": "31.0.0",
    "datafusion_version": "31.0.0",
    "num_cpus": 8,
    "start_time": 1694704746
  },
  "queries": [
    {
      "iterations": [
        {
          "elapsed": 86441.988369,
          "row_count": 10838832
        },
        {
          "elapsed": 73182.81637,
          "row_count": 10838832
        },
        {
          "elapsed": 69536.53120900001,
          "row_count": 10838832
        },
        {
          "elapsed": 72179.459332,
          "row_count": 10838832
        },
        {
          "elapsed": 71660.65385500001,
          "row_count": 10838832
        }
      ],
      "query": "sort utf8",
      "start_time": 1694704746
    },
    {
      "iterations": [
        {
          "elapsed": 89047.348867,
          "row_count": 10838832
        },
        {
          "elapsed": 89168.79565399999,
          "row_count": 10838832
        },
        {
          "elapsed": 88951.52251499999,
          "row_count": 10838832
        },
        {
          "elapsed": 98504.891076,
          "row_count": 10838832
        },
        {
          "elapsed": 89457.13566700001,
          "row_count": 10838832
        }
      ],
      "query": "sort int",
      "start_time": 1694705119
    },
    {
      "iterations": [
        {
          "elapsed": 71307.72546599999,
          "row_count": 10838832
        },
        {
          "elapsed": 71463.172695,
          "row_count": 10838832
        },
        {
          "elapsed": 77577.714498,
          "row_count": 10838832
        },
        {
          "elapsed": 71730.90387400001,
          "row_count": 10838832
        },
        {
          "elapsed": 72624.773934,
          "row_count": 10838832
        }
      ],
      "query": "sort decimal",
      "start_time": 1694705575
    },
    {
      "iterations": [
        {
          "elapsed": 96741.53251,
          "row_count": 10838832
        },
        {
          "elapsed": 97752.85497999999,
          "row_count": 10838832
        },
        {
          "elapsed": 95654.327294,
          "row_count": 10838832
        },
        {
          "elapsed": 96713.50062400001,
          "row_count": 10838832
        },
        {
          "elapsed": 94291.325883,
          "row_count": 10838832
        }
      ],
      "query": "sort integer tuple",
      "start_time": 1694705940
    },
    {
      "iterations": [
        {
          "elapsed": 72497.7272,
          "row_count": 10838832
        },
        {
          "elapsed": 72443.536695,
          "row_count": 10838832
        },
        {
          "elapsed": 73023.115685,
          "row_count": 10838832
        },
        {
          "elapsed": 73800.62915899999,
          "row_count": 10838832
        },
        {
          "elapsed": 71583.947462,
          "row_count": 10838832
        }
      ],
      "query": "sort utf8 tuple",
      "start_time": 1694706421
    },
    {
      "iterations": [
        {
          "elapsed": 81407.140528,
          "row_count": 10838832
        },
        {
          "elapsed": 85593.791929,
          "row_count": 10838832
        },
        {
          "elapsed": 81712.19639,
          "row_count": 10838832
        },
        {
          "elapsed": 80993.492422,
          "row_count": 10838832
        },
        {
          "elapsed": 83290.99224600001,
          "row_count": 10838832
        }
      ],
      "query": "sort mixed tuple",
      "start_time": 1694706785
    }
  ]
}

Here is a zip file with a bunch of example benchmark json files: results.zip

Describe alternatives you've considered

No response

Additional context

Related to https://github.com/apache/arrow-datafusion/issues/5504 tracking data over time

alamb commented 1 year ago

I am currently focused on getting to the point where we can run the benchmarks repeatedly -- once I have it so I can easily run the benchmarks I will start working on running and collecting data over time.

I view the line protocol conversion as part of the story of over time conversion

alamb commented 11 months ago

I added details of a proposed design in the Proposed Design section of the description of this ticket

comphead commented 11 months ago

@alamb would you mind to clarify a bit?

Are you planning to keep a collection kinda <DataFusionVersion, Vec<Benchmark>>?

If so the its also possible to backfill the history with archived benchmarks/versions

comphead commented 11 months ago

Apache Arrow uses the Conbench for similar purpose https://github.com/conbench/conbench

alamb commented 11 months ago

@comphead

Are you planning to keep a collection kinda <DataFusionVersion, Vec>?

Yes -- that is basically what I have in mind. In my mind we would store it as lineprotocol and check it into a repo somewhere and then visualize it with existing tools (e.g. grafana and influxdb, which is what I know, but I am happy to use some other open source stack)

If so the its also possible to backfill the history with archived benchmarks/versions

Agree

Apache Arrow uses the Conbench for similar purpose https://github.com/conbench/conbench

Yes, I looked briefly into conbench (in fact there is some vestigal code in datafusion -- see https://github.com/apache/arrow-datafusion/tree/main/conbench and https://github.com/apache/arrow-datafusion/issues/5504 for details)

TLDR is I could not get it to work, and it seems as if the dev team went dormant(ish) so I didn't pursue it farther. If someone else can get it to work that would be great

Smurphy000 commented 10 months ago

Currently working on this. I have extended the existing compare.py since it already reads in the existing json format well and I am producing rows of data in line protocol like this

benchmark,benchmark_version=32.0.0,datafusion_version=32.0.0,num_cpus=4 query="Query 1",row_count="4",elapsed="840.606454" 1697423932000

At this moment I am trying to setup an ingestion into influx using docker, but cannot quite seem to make the data available to visualize.

alamb commented 10 months ago

@Smurphy000 could you potentially push up what you have as a draft PR ? Maybe I can help with the "how to get this ingested / visualized" as I have more experience with that

apache / datafusion