brexhq / substation

Substation is a toolkit for routing, normalizing, and enriching security event and audit logs.
https://substation.readme.io
MIT License
322 stars 16 forks source link

perf(transform): Improve AggregateToArray Throughput #150

Closed jshlbrd closed 6 months ago

jshlbrd commented 6 months ago

Description

Motivation and Context

I noticed this transform was involved in causing slowdown in some production data pipelines. Based on benchmark results, this change increases the throughput dramatically:

main:


% ./cmd/development/benchmark/substation/substation -file ./examples/config/transform/aggregate/summarize/data.jsonl -config ./examples/cmd/development/benchmark/aggregate/config.json -count 10000
2024-03-20T20:08:59.253422-07:00: Configuring Substation
2024-03-20T20:08:59.255327-07:00: Loading data into memory
2024-03-20T20:08:59.263569-07:00: Starting benchmark
2024-03-20T20:09:13.166251-07:00: Ending benchmark

Benchmark results:
- 190000 events in 13.902677334s
- 13666.43 events per second
- 14 MB in 13.902677334s
- 1.01 MB per second

This PR:


% ./cmd/development/benchmark/substation/substation -file ./examples/config/transform/aggregate/summarize/data.jsonl -config ./examples/cmd/development/benchmark/aggregate/config.json -count 10000
2024-03-20T20:11:12.093623-07:00: Configuring Substation
2024-03-20T20:11:12.095701-07:00: Loading data into memory
2024-03-20T20:11:12.10391-07:00: Starting benchmark
2024-03-20T20:11:12.534654-07:00: Ending benchmark

Benchmark results:
- 190000 events in 430.74925ms
- 441091.89 events per second
- 14 MB in 430.74925ms
- 32.59 MB per second

The config that was benchmarked only calls this transform:

local sub = import '../../../../../build/config/substation.libsonnet';

{
  transforms: [
    sub.tf.agg.to.array(),
  ]
}

How Has This Been Tested?

Using the benchmark app (see above).

Types of changes

Checklist: