Add Benchmark CLI for format comparison and perf tracking

nirosys commented 1 year ago

Issue #, if available:

Description of changes:

Changes to the Build

Google Test

This PR changes the googletest usage a little. This was primarily to support google-benchmark's googletest dependency, but resulted in further adjustments to supply users with the ability to:

Bring their own google test version without colliding with ion-c's version. Just import googletest however is needed so that the gtest_main target is available, and the ion-c build will disable its own import.
Disable ion-c unit tests from building. Turn the IONC_BUILD_TEST flag off in order to stop the ion-c build from building ion-c unit tests.

Build Type: Profiling

This PR's primary purpose is to add a new CLI to allow for benchmarking for performance baselining, analysis, and improvement quantification. In order to produce builds where we have optimized builds, but still have enough debug information to generate flame graphs, and use profilers effectively a new build type was added to do just that. Rather than building with -DCMAKE_BUILD_TYPE=Release, a user can now use -DCMAKE_BUILD_TYPE=Profiling and cmake will produce a build that has debug information and optimization passes.

Addition of IonCBench

The primary purpose of this PR is to add the ion-c benchmark CLI. In order to build this tool the user can set the CMake variable IONC_BENCHMARKING_ENABLED to ON. This is done by default when using the Profiling build type. The tool is intended to provide a similar set of features to the other benchmark CLIs found for ion-java, and ion-python.

Currently, the tool allows the user to provide their own data, and perform both full deserialization and full serialization of that data using ion-c (in text mode, or binary), MsgPack (MsgPack-C), JSON (yyjson, and json-c), and CBOR (libcbor).

This PR is the first iteration of the ion-bench tool, and provides functionality for measuring CPU time, and other CPU metrics such as instruction counts (as long as it is built with libpfm). The tool currently supports a single benchmark run per invocation. Each benchmark run requires an input dataset, a specified supported implementation, and the requested benchmark to run. Currently the tool supports two benchmarks:

deserialize_all - Where all data is read, and materialized, from the input dataset.
serialize_all - Where a flattened representation of the data is read from the input dataset, and re-written using the same format. The timing information is only for the write of the dataset.

In both benchmarks timing does not include the IO to get the input dataset into memory.

The expectation is that more benchmarks will be added, along with more format implementations, in order to compare runtime, memory usage, and data size, between ion and other data formats.

Usage

The tool has a --help which has a list of all of the arguments that can be used:

# tools/ion-bench/src/IonCBench --help
Usage: tools/ion-bench/src/IonCBench
  --help                    Display this help and exit.
  -L, --list-libs           List available libraries to benchmark.
  -B, --list-bench          List available benchmarks.
  -n, --name=<string>       Name to use for the run in reporting
  -b, --benchmark=<string>  Benchmark to run. (read or write)
  -d, --dataset=FILE        Add a dataset to run benchmark with.
  -l, --library=<string>    Library to use (use -L to see a list of supported libraries)
  --no-stats                Do not generate benchmarks stats. (Used primarily for profiling)
  -p, --pretty-print        Pretty print text output

By default, output will be presented in a tabular format.

# tools/ion-bench/src/IonCBench -b deserialize_all -d ../../tools/ion-bench/data/service_log_legacy/service_log_legacy.10n -l ion-c-binary -n "Ion Binary"
Benchmark: deserialize_all
2023-10-12T21:44:24+00:00
Running tools/ion-bench/src/IonCBench
Run on (6 X 2592.01 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 0.11, 0.35, 0.26
---------------------------------------------------------------------------------
Benchmark                       Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------
service_log_legacy.10n 1071590601 ns   1071241019 ns            1 Bps=20.7825M/s bools=0 nulls=0 nums=2.73713M objs=880.411k strs=69.818k

Since the tool leans very heavily on google-benchmark, all google-benchmark arguments are also supported:

# tools/ion-bench/src/IonCBench -b deserialize_all -d ../../tools/ion-bench/data/service_log_legacy/service_log_legacy.10n -l ion-c-binary -n "Ion Binary" --benchmark_format=json
Benchmark: deserialize_all
{
  "context": {
    "date": "2023-10-12T21:51:55+00:00",
    "host_name": "663ce7a9e233",
    "executable": "tools/ion-bench/src/IonCBench",
    "num_cpus": 6,
    "mhz_per_cpu": 2592,
    "cpu_scaling_enabled": false,
    "caches": [
      {
        "type": "Data",
        "level": 1,
        "size": 32768,
        "num_sharing": 1
      },
      {
        "type": "Instruction",
        "level": 1,
        "size": 32768,
        "num_sharing": 1
      },
      {
        "type": "Unified",
        "level": 2,
        "size": 262144,
        "num_sharing": 1
      },
      {
        "type": "Unified",
        "level": 3,
        "size": 12582912,
        "num_sharing": 6
      }
    ],
    "load_avg": [0.198242,0.136719,0.174316],
    "library_build_type": "release"
  },
  "benchmarks": [
    {
      "name": "service_log_legacy.10n",
      "family_index": 0,
      "per_family_instance_index": 0,
      "run_name": "service_log_legacy.10n",
      "run_type": "iteration",
      "repetitions": 1,
      "repetition_index": 0,
      "threads": 1,
      "iterations": 1,
      "real_time": 1.0773163559999831e+09,
      "cpu_time": 1.0769988449999998e+09,
      "time_unit": "ns",
      "Bps": 2.0671366643851880e+07,
      "bools": 0.0000000000000000e+00,
      "nulls": 0.0000000000000000e+00,
      "nums": 2.7371320000000000e+06,
      "objs": 8.8041100000000000e+05,
      "strs": 6.9818000000000000e+04
    }
  ]
}

Changes Since Original Post

Updated GHA workflows to pull the last 50 commits with tags so version.h can be generated with the current implementation. (Thinking about following up with a change to the version.h generation so we don't have to do this, and risk losing the version if too many commits get added)
Updated Build and Test workflows to be more act friendly. This meant re-working the matrices so that they can be simple values rather than objects. Matrix values are only matched on the top level, so running in act would mean providing the full object each run, which can be error prone. I standardized the builds on two keys, image and toolchain. image defines the container, or VM Image (runs-on), that the job uses, and toolchain defines whether to use gcc, or clang (currently).
Fixed a memleak that was reported by the ubuntu/clang build that could occur when closing or resetting a reader.
Fixed an uninitialized data issue in one of the ion_decimal unit tests.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

nirosys commented 1 year ago

Build checks failed due a couple issues. One where gtest isn't linking, which doesn't reproduce on my system. Another is cmake acting like it hasn't known what C++17 is since 3.8.. continuing to dig.

nirosys commented 1 year ago

The unit test crash that is occurring with the amazonlinux:1 gcc72 build isn't reliably reproducible locally. The most recent commit should address the memleak that was identified by the ubuntu clang build, but I'm not sure if it is related. Will continue digging if not.

nirosys commented 1 year ago

Ok. I've reproduced the crash that GHA is seeing, and it's pretty awesome. /s

All of my attempts to reproduce the issue failed. I combed through the package versions, and made sure the docker digest SHA reported by GitHub matched the image I was using locally. Everything lined up but for some reason the issue would not reproduce. Until I piped the output of act, into tee so I could grep through the output to make sure there wasn't any pointers in the haystack of warnings ion-c produces. As soon as the job ran with this new pipe, it crashed with the same error GHA produces.

I was also able to reproduce the issue by simply redirecting the output of the unit tests into a file, which made it possible to easily run it under gdb. I had known from the output of the GHA crash that the issue was triggering in the test_ion_decimal.cpp tests, specifically WriteAllValues, but had no reason to see an issue. Debugging without the redirect showed the values within the function to be sane, and no error occurred during the free.

With the redirect, I'm guessing some things have shifted on the stack, and the uninitialized ion_decimal defined within the WriteAllValues test ends up lining up with data that makes the ION_DECIMAL's type field contain the value for ION_DECIMAL_TYPE_NUMBER. This results in the only code path that tries to free the decimal's value.num_value buffer, which is also initialized with random stack data and cannot be free'd.

nirosys commented 1 year ago

Putting this into review. There may be an issue with ion-test-drivers, I'm still looking into that, but this PR should be good to start moving forward.

nirosys commented 1 year ago

Thank you! I'm going to push up a new commit with the commented code, and typos, fixed, and output added. Then get I'll this merged. Then PR the regression workflow, and follow up with the code changes discussed above. Unless anyone has an argument against that.

amazon-ion / ion-c