Improves the performance (throughput) of the AggregateToArray transform
Simplifies the JSON Lines pattern in the config library
Motivation and Context
I noticed this transform was involved in causing slowdown in some production data pipelines. Based on benchmark results, this change increases the throughput dramatically:
main:
% ./cmd/development/benchmark/substation/substation -file ./examples/config/transform/aggregate/summarize/data.jsonl -config ./examples/cmd/development/benchmark/aggregate/config.json -count 10000
2024-03-20T20:08:59.253422-07:00: Configuring Substation
2024-03-20T20:08:59.255327-07:00: Loading data into memory
2024-03-20T20:08:59.263569-07:00: Starting benchmark
2024-03-20T20:09:13.166251-07:00: Ending benchmark
Benchmark results:
- 190000 events in 13.902677334s
- 13666.43 events per second
- 14 MB in 13.902677334s
- 1.01 MB per second
This PR:
% ./cmd/development/benchmark/substation/substation -file ./examples/config/transform/aggregate/summarize/data.jsonl -config ./examples/cmd/development/benchmark/aggregate/config.json -count 10000
2024-03-20T20:11:12.093623-07:00: Configuring Substation
2024-03-20T20:11:12.095701-07:00: Loading data into memory
2024-03-20T20:11:12.10391-07:00: Starting benchmark
2024-03-20T20:11:12.534654-07:00: Ending benchmark
Benchmark results:
- 190000 events in 430.74925ms
- 441091.89 events per second
- 14 MB in 430.74925ms
- 32.59 MB per second
The config that was benchmarked only calls this transform:
local sub = import '../../../../../build/config/substation.libsonnet';
{
transforms: [
sub.tf.agg.to.array(),
]
}
How Has This Been Tested?
Using the benchmark app (see above).
Types of changes
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
Description
Motivation and Context
I noticed this transform was involved in causing slowdown in some production data pipelines. Based on benchmark results, this change increases the throughput dramatically:
main:
This PR:
The config that was benchmarked only calls this transform:
How Has This Been Tested?
Using the benchmark app (see above).
Types of changes
Checklist: