chore: Add basic benchmark suite to C library

paleolimbot commented 7 months ago

This PR adds an initial set of benchmarks covering some realistic usage patterns. The general approach is to use doxygen comments to document the benchmarks, which will run against the released version and the previous version. I'm not sure exactly what the output format will be but I'd like the benchmarks to be written in such a way that there's a path to programatically generating a report (maybe using conbench, maybe just a Quarto document).

Work in progress!

pitrou commented 7 months ago

Can you show what the results look like?

paleolimbot commented 7 months ago

This is from the CI run (so timings are maybe meaningless), but the output looks like:

  2024-02-28T17:43:48+00:00
  ::group::array_benchmark

  Run on (4 X 3139.35 MHz CPU s)
  CPU Caches:
    L1 Data 32 KiB (x2)
    L1 Instruction 32 KiB (x2)
    L2 Unified 512 KiB (x2)
    L3 Unified 32768 KiB (x1)
  Load Average: 0.89, 0.34, 0.12
  -------------------------------------------------------------------------------------------------
  Benchmark                                       Time             CPU   Iterations UserCounters...
  -------------------------------------------------------------------------------------------------
  BM_ArrayViewGetIntUnsafeInt8              1576584 ns      1576545 ns          449 items_per_second=634.298M/s
  BM_ArrayViewGetIntUnsafeInt16              936609 ns       936540 ns          749 items_per_second=1.06776G/s
  BM_ArrayViewGetIntUnsafeInt32             1244619 ns      1244574 ns          562 items_per_second=803.488M/s
  BM_ArrayViewGetIntUnsafeInt64              945470 ns       945435 ns          745 items_per_second=1.05771G/s
  BM_ArrayViewGetIntUnsafeInt64CheckNull    1751277 ns      1751243 ns          396 items_per_second=571.023M/s

::group::schema_benchmark
  2024-02-28T17:43:52+00:00
  Running ./schema_benchmark
  Run on (4 X 3241.55 MHz CPU s)
  CPU Caches:
    L1 Data 32 KiB (x2)
    L1 Instruction 32 KiB (x2)
    L2 Unified 512 KiB (x2)
    L3 Unified 32768 KiB (x1)
  Load Average: 0.90, 0.35, 0.13
  --------------------------------------------------------------------------------------
  Benchmark                            Time             CPU   Iterations UserCounters...
  --------------------------------------------------------------------------------------
  BM_SchemaInitWideStruct         768760 ns       768689 ns          911 items_per_second=13.0092M/s
  BM_SchemaViewInitWideStruct     175154 ns       175138 ns         4202 items_per_second=57.0978M/s

pitrou commented 7 months ago

This is from the CI run (so timings are maybe meaningless), but the output looks like:

Thank you! This looks fine to me.

apache / arrow-nanoarrow

chore: Add basic benchmark suite to C library #393