generate in goroutines - Githubissues

aspacca commented 1 year ago

main branch:

goos: darwin
goarch: amd64
pkg: github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark_GeneratorCustomTemplateJSONContent-16       235588         47286 ns/op         432 B/op         14 allocs/op
Benchmark_GeneratorTextTemplateJSONContent-16          24507        487438 ns/op       48725 B/op       2252 allocs/op
Benchmark_GeneratorCustomTemplateVPCFlowLogs-16      6558933          1840 ns/op          64 B/op          2 allocs/op
Benchmark_GeneratorTextTemplateVPCFlowLogs-16         526669         22920 ns/op        2322 B/op         95 allocs/op
PASS
ok      github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib 57.004s

this branch:

goos: darwin
goarch: amd64
pkg: github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark_GeneratorCustomTemplateJSONContent-16        97422        145236 ns/op       18980 B/op        972 allocs/op
Benchmark_GeneratorTextTemplateJSONContent-16          18703        820674 ns/op       49159 B/op       2265 allocs/op
Benchmark_GeneratorCustomTemplateVPCFlowLogs-16      1504225          8486 ns/op         501 B/op         35 allocs/op
Benchmark_GeneratorTextTemplateVPCFlowLogs-16         299754         43765 ns/op        2540 B/op        111 allocs/op
PASS
ok      github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib 73.641s

aspacca commented 1 year ago

@endorama you were right

tweaking the channel size seems to show different performance behaviour and some improvement, but I'd say it's not reliable, since it's probably very relevant to the host machine

elasticmachine commented 1 year ago

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

#### Build stats * Start Time: 2023-02-27T08:13:37.515+0000 * Duration: 3 min 46 sec #### Test stats :test_tube: | Test | Results | | ------------ | :-----------------------------: | | Failed | 0 | | Passed | 65 | | Skipped | 0 | | Total | 65 |

:robot: GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with: - `/test` : Re-trigger the build.

aspacca commented 1 year ago

@endorama I've built different binaries across the refactoring at various commit in the branch

here's some outcomes

comparing legacy (pre custom template) with current main and using goroutines with unbuffered channel (gen-with-custom_template-goroutines). each binary was built with the default behaviour to write to file and with replacing to write to /dev/null or using a io.Discard (I wanted to assess the overhead coming disk access). still main is faster

./gen-with-custom_template-goroutines generate aws dynamodb 1.28.3 -t 20G
85.81user 24.40system 1:48.29elapsed 101%CPU (0avgtext+0avgdata 34880maxresident)k
0inputs+39434472outputs (0major+15962minor)pagefaults 0swaps

./gen-with-custom_template-goroutines-dev.null generate aws dynamodb 1.28.3 -t 20G
79.84user 7.85system 1:26.60elapsed 101%CPU (0avgtext+0avgdata 35412maxresident)k
0inputs+0outputs (0major+9166minor)pagefaults 0swaps

./gen-with-custom_template-goroutines-io.discard generate aws dynamodb 1.28.3 -t 20G
74.79user 5.07system 1:18.87elapsed 101%CPU (0avgtext+0avgdata 34884maxresident)k
0inputs+0outputs (0major+10965minor)pagefaults 0swaps

./gen-with-custom_template-main generate aws dynamodb 1.28.3 -t 20G
81.36user 24.34system 1:43.85elapsed 101%CPU (0avgtext+0avgdata 33904maxresident)k
0inputs+39062504outputs (0major+20773minor)pagefaults 0swaps

./gen-with-custom_template-main-dev.null generate aws dynamodb 1.28.3 -t 20G
76.97user 7.62system 1:23.50elapsed 101%CPU (0avgtext+0avgdata 35452maxresident)k
0inputs+0outputs (0major+12925minor)pagefaults 0swaps

./gen-with-custom_template-main-io.discard generate aws dynamodb 1.28.3 -t 20G
72.74user 4.72system 1:16.48elapsed 101%CPU (0avgtext+0avgdata 36680maxresident)k
0inputs+0outputs (0major+17456minor)pagefaults 0swaps

./gen-with-legacy generate aws dynamodb 1.28.3 -t 20G
102.62user 26.73system 2:06.55elapsed 102%CPU (0avgtext+0avgdata 41588maxresident)k
128inputs+39062504outputs (1major+20325minor)pagefaults 0swaps

./gen-with-legacy-dev.null generate aws dynamodb 1.28.3 -t 20G
94.32user 8.19system 1:40.53elapsed 101%CPU (0avgtext+0avgdata 34824maxresident)k
328inputs+0outputs (4major+19482minor)pagefaults 0swaps

./gen-with-legacy-io.discard generate aws dynamodb 1.28.3 -t 20G
90.58user 5.30system 1:33.94elapsed 102%CPU (0avgtext+0avgdata 40156maxresident)k
328inputs+0outputs (4major+23288minor)pagefaults 0swaps

comparing text template current main and using goroutines with unbuffered channel (./gen-with-text_template-goroutines-unbufferedchan), with channel buffered to half of runtime.GOMAXPROCS(0) (./gen-with-text_template-goroutines-chansize) and without channel but with a state local to every field (./gen-with-text_template-goroutines-nochan). as before each binary was built with the default behaviour to write to file and with replacing to write to /dev/null or using a io.Discard. still main is faster

./gen-with-text_template-goroutines-chansize generate aws dynamodb 1.28.3 -t 20G
1866.58user 297.94system 17:14.26elapsed 209%CPU (0avgtext+0avgdata 41056maxresident)k
160inputs+40217872outputs (3major+2666517minor)pagefaults 0swaps

./gen-with-text_template-goroutines-chansize-dev.null generate aws dynamodb 1.28.3 -t 20G
1800.80user 240.11system 16:12.13elapsed 209%CPU (0avgtext+0avgdata 42172maxresident)k
160inputs+0outputs (3major+2898400minor)pagefaults 0swaps

./gen-with-text_template-goroutines-chansize-io.discard generate aws dynamodb 1.28.3 -t 20G
1796.17user 236.99system 16:03.14elapsed 211%CPU (0avgtext+0avgdata 45364maxresident)k
160inputs+0outputs (3major+3265729minor)pagefaults 0swaps

./gen-with-text_template-goroutines-nochan generate aws dynamodb 1.28.3 -t 20G
910.16user 74.99system 14:27.17elapsed 113%CPU (0avgtext+0avgdata 37504maxresident)k
368inputs+39185808outputs (4major+538156minor)pagefaults 0swaps

./gen-with-text_template-goroutines-nochan-dev.null generate aws dynamodb 1.28.3 -t 20G
896.23user 51.32system 13:50.42elapsed 114%CPU (0avgtext+0avgdata 46144maxresident)k
376inputs+0outputs (4major+536475minor)pagefaults 0swaps

./gen-with-text_template-goroutines-nochan-io.discard generate aws dynamodb 1.28.3 -t 20G
892.24user 43.21system 13:39.13elapsed 114%CPU (0avgtext+0avgdata 36460maxresident)k
368inputs+0outputs (4major+491776minor)pagefaults 0swaps

./gen-with-text_template-goroutines-unbufferedchan generate aws dynamodb 1.28.3 -t 20G
1801.91user 292.29system 16:47.15elapsed 207%CPU (0avgtext+0avgdata 44076maxresident)k
160inputs+39289992outputs (3major+2758282minor)pagefaults 0swaps

./gen-with-text_template-goroutines-unbufferedchan-dev.null generate aws dynamodb 1.28.3 -t 20G
1835.62user 242.94system 16:36.15elapsed 208%CPU (0avgtext+0avgdata 36960maxresident)k
160inputs+0outputs (3major+2822023minor)pagefaults 0swaps

./gen-with-text_template-goroutines-unbufferedchan-io.discard generate aws dynamodb 1.28.3 -t 20G
1873.59user 240.32system 16:49.39elapsed 209%CPU (0avgtext+0avgdata 42716maxresident)k
160inputs+0outputs (3major+2730760minor)pagefaults 0swaps

./gen-with-text_template-main generate aws dynamodb 1.28.3 -t 20G
767.99user 52.43system 12:50.66elapsed 106%CPU (0avgtext+0avgdata 38352maxresident)k
328inputs+39062728outputs (4major+371663minor)pagefaults 0swaps

./gen-with-text_template-main-dev.null generate aws dynamodb 1.28.3 -t 20G
747.52user 25.22system 12:04.20elapsed 106%CPU (0avgtext+0avgdata 42136maxresident)k
448inputs+0outputs (5major+357297minor)pagefaults 0swaps

./gen-with-text_template-main-io.discard generate aws dynamodb 1.28.3 -t 20G
736.85user 22.40system 11:51.84elapsed 106%CPU (0avgtext+0avgdata 40688maxresident)k
320inputs+0outputs (4major+374720minor)pagefaults 0swaps

all the above used no specific generator configuration

I've run another test using the ec2_metrics template (that has an high cardinality on several fields). it's comparing text template with current mai and using goroutines with channels buffered to half of runtime.GOMAXPROCS(0) (./gen-with-text_template-goroutines-chansize) and without gouroutines/channels but with a state local to every field (./gen-with-text_template-goroutines-nochan) and without goroutines/channels with a global state slightly refactored from its version in main (./gen-with-text_template-goroutines-globalstate). the latest binary is up to date with the latest commit in the branch. It also pre-calculates the number of events to generate based on the requested output size and the size of an initial template rendering.

./gen-with-text_template-goroutines-chansize generate-with-template gotext.tpl fields.yml -c configs.yml -y gotext -t 30G
3469.86user 361.14system 33:09.41elapsed 192%CPU (0avgtext+0avgdata 16740maxresident)k
17504inputs+51098160outputs (110major+3464263minor)pagefaults 0swaps

./gen-with-text_template-goroutines-globalstate generate-with-template gotext.tpl fields.yml -c configs.yml -y gotext -t 30G
2827.63user 636.22system 30:03.28elapsed 192%CPU (0avgtext+0avgdata 16676maxresident)k
352inputs+51874752outputs (5major+5466338minor)pagefaults 0swaps

./gen-with-text_template-goroutines-nochan generate-with-template gotext.tpl fields.yml -c configs.yml -y gotext -t 30G
2018.64user 172.34system 31:27.58elapsed 116%CPU (0avgtext+0avgdata 15556maxresident)k
104inputs+52320520outputs (6major+2643605minor)pagefaults 0swaps

./gen-with-text_template-main generate-with-template gotext.tpl fields.yml -c configs.yml -y gotext -t 30G
3043.93user 633.86system 33:16.69elapsed 184%CPU (0avgtext+0avgdata 16052maxresident)k
5968inputs+58594376outputs (43major+6518558minor)pagefaults 0swaps

in this case the performance between gen-with-text_template-main and gen-with-text_template-goroutines-globalstate seems very similar. the 3 minutes longer taken by gen-with-text_template-main are related to generating more events

while the final goal of improving the performance was not reached I would keep a few elements of the refactoring:

pre-calculating the number of events to generate
remove error return from EmitF
same general cleaning of the code (like writing the custom template prefix on emit() rather than on the bind functions)

elastic / elastic-integration-corpus-generator-tool

generate in goroutines #51

:green_heart: Build Succeeded

:robot: GitHub comments