Closed aspacca closed 1 year ago
@endorama you were right
tweaking the channel size seems to show different performance behaviour and some improvement, but I'd say it's not reliable, since it's probably very relevant to the host machine
the below badges are clickable and redirect to their specific view in the CI or DOCS
#### Build stats * Start Time: 2023-02-27T08:13:37.515+0000 * Duration: 3 min 46 sec #### Test stats :test_tube: | Test | Results | | ------------ | :-----------------------------: | | Failed | 0 | | Passed | 65 | | Skipped | 0 | | Total | 65 |
To re-run your PR in the CI, just comment with: - `/test` : Re-trigger the build.
@endorama I've built different binaries across the refactoring at various commit in the branch
here's some outcomes
gen-with-custom_template-goroutines
). each binary was built with the default behaviour to write to file and with replacing to write to /dev/null
or using a io.Discard
(I wanted to assess the overhead coming disk access). still main is faster./gen-with-custom_template-goroutines generate aws dynamodb 1.28.3 -t 20G
85.81user 24.40system 1:48.29elapsed 101%CPU (0avgtext+0avgdata 34880maxresident)k
0inputs+39434472outputs (0major+15962minor)pagefaults 0swaps
./gen-with-custom_template-goroutines-dev.null generate aws dynamodb 1.28.3 -t 20G
79.84user 7.85system 1:26.60elapsed 101%CPU (0avgtext+0avgdata 35412maxresident)k
0inputs+0outputs (0major+9166minor)pagefaults 0swaps
./gen-with-custom_template-goroutines-io.discard generate aws dynamodb 1.28.3 -t 20G
74.79user 5.07system 1:18.87elapsed 101%CPU (0avgtext+0avgdata 34884maxresident)k
0inputs+0outputs (0major+10965minor)pagefaults 0swaps
./gen-with-custom_template-main generate aws dynamodb 1.28.3 -t 20G
81.36user 24.34system 1:43.85elapsed 101%CPU (0avgtext+0avgdata 33904maxresident)k
0inputs+39062504outputs (0major+20773minor)pagefaults 0swaps
./gen-with-custom_template-main-dev.null generate aws dynamodb 1.28.3 -t 20G
76.97user 7.62system 1:23.50elapsed 101%CPU (0avgtext+0avgdata 35452maxresident)k
0inputs+0outputs (0major+12925minor)pagefaults 0swaps
./gen-with-custom_template-main-io.discard generate aws dynamodb 1.28.3 -t 20G
72.74user 4.72system 1:16.48elapsed 101%CPU (0avgtext+0avgdata 36680maxresident)k
0inputs+0outputs (0major+17456minor)pagefaults 0swaps
./gen-with-legacy generate aws dynamodb 1.28.3 -t 20G
102.62user 26.73system 2:06.55elapsed 102%CPU (0avgtext+0avgdata 41588maxresident)k
128inputs+39062504outputs (1major+20325minor)pagefaults 0swaps
./gen-with-legacy-dev.null generate aws dynamodb 1.28.3 -t 20G
94.32user 8.19system 1:40.53elapsed 101%CPU (0avgtext+0avgdata 34824maxresident)k
328inputs+0outputs (4major+19482minor)pagefaults 0swaps
./gen-with-legacy-io.discard generate aws dynamodb 1.28.3 -t 20G
90.58user 5.30system 1:33.94elapsed 102%CPU (0avgtext+0avgdata 40156maxresident)k
328inputs+0outputs (4major+23288minor)pagefaults 0swaps
./gen-with-text_template-goroutines-unbufferedchan
), with channel buffered to half of runtime.GOMAXPROCS(0)
(./gen-with-text_template-goroutines-chansize
) and without channel but with a state local to every field (./gen-with-text_template-goroutines-nochan
). as before each binary was built with the default behaviour to write to file and with replacing to write to /dev/null
or using a io.Discard
. still main is faster./gen-with-text_template-goroutines-chansize generate aws dynamodb 1.28.3 -t 20G
1866.58user 297.94system 17:14.26elapsed 209%CPU (0avgtext+0avgdata 41056maxresident)k
160inputs+40217872outputs (3major+2666517minor)pagefaults 0swaps
./gen-with-text_template-goroutines-chansize-dev.null generate aws dynamodb 1.28.3 -t 20G
1800.80user 240.11system 16:12.13elapsed 209%CPU (0avgtext+0avgdata 42172maxresident)k
160inputs+0outputs (3major+2898400minor)pagefaults 0swaps
./gen-with-text_template-goroutines-chansize-io.discard generate aws dynamodb 1.28.3 -t 20G
1796.17user 236.99system 16:03.14elapsed 211%CPU (0avgtext+0avgdata 45364maxresident)k
160inputs+0outputs (3major+3265729minor)pagefaults 0swaps
./gen-with-text_template-goroutines-nochan generate aws dynamodb 1.28.3 -t 20G
910.16user 74.99system 14:27.17elapsed 113%CPU (0avgtext+0avgdata 37504maxresident)k
368inputs+39185808outputs (4major+538156minor)pagefaults 0swaps
./gen-with-text_template-goroutines-nochan-dev.null generate aws dynamodb 1.28.3 -t 20G
896.23user 51.32system 13:50.42elapsed 114%CPU (0avgtext+0avgdata 46144maxresident)k
376inputs+0outputs (4major+536475minor)pagefaults 0swaps
./gen-with-text_template-goroutines-nochan-io.discard generate aws dynamodb 1.28.3 -t 20G
892.24user 43.21system 13:39.13elapsed 114%CPU (0avgtext+0avgdata 36460maxresident)k
368inputs+0outputs (4major+491776minor)pagefaults 0swaps
./gen-with-text_template-goroutines-unbufferedchan generate aws dynamodb 1.28.3 -t 20G
1801.91user 292.29system 16:47.15elapsed 207%CPU (0avgtext+0avgdata 44076maxresident)k
160inputs+39289992outputs (3major+2758282minor)pagefaults 0swaps
./gen-with-text_template-goroutines-unbufferedchan-dev.null generate aws dynamodb 1.28.3 -t 20G
1835.62user 242.94system 16:36.15elapsed 208%CPU (0avgtext+0avgdata 36960maxresident)k
160inputs+0outputs (3major+2822023minor)pagefaults 0swaps
./gen-with-text_template-goroutines-unbufferedchan-io.discard generate aws dynamodb 1.28.3 -t 20G
1873.59user 240.32system 16:49.39elapsed 209%CPU (0avgtext+0avgdata 42716maxresident)k
160inputs+0outputs (3major+2730760minor)pagefaults 0swaps
./gen-with-text_template-main generate aws dynamodb 1.28.3 -t 20G
767.99user 52.43system 12:50.66elapsed 106%CPU (0avgtext+0avgdata 38352maxresident)k
328inputs+39062728outputs (4major+371663minor)pagefaults 0swaps
./gen-with-text_template-main-dev.null generate aws dynamodb 1.28.3 -t 20G
747.52user 25.22system 12:04.20elapsed 106%CPU (0avgtext+0avgdata 42136maxresident)k
448inputs+0outputs (5major+357297minor)pagefaults 0swaps
./gen-with-text_template-main-io.discard generate aws dynamodb 1.28.3 -t 20G
736.85user 22.40system 11:51.84elapsed 106%CPU (0avgtext+0avgdata 40688maxresident)k
320inputs+0outputs (4major+374720minor)pagefaults 0swaps
all the above used no specific generator configuration
I've run another test using the ec2_metrics template (that has an high cardinality on several fields).
it's comparing text template with current mai and using goroutines with channels buffered to half of runtime.GOMAXPROCS(0)
(./gen-with-text_template-goroutines-chansize
) and without gouroutines/channels but with a state local to every field (./gen-with-text_template-goroutines-nochan
) and without goroutines/channels with a global state slightly refactored from its version in main (./gen-with-text_template-goroutines-globalstate
). the latest binary is up to date with the latest commit in the branch. It also pre-calculates the number of events to generate based on the requested output size and the size of an initial template rendering.
./gen-with-text_template-goroutines-chansize generate-with-template gotext.tpl fields.yml -c configs.yml -y gotext -t 30G
3469.86user 361.14system 33:09.41elapsed 192%CPU (0avgtext+0avgdata 16740maxresident)k
17504inputs+51098160outputs (110major+3464263minor)pagefaults 0swaps
./gen-with-text_template-goroutines-globalstate generate-with-template gotext.tpl fields.yml -c configs.yml -y gotext -t 30G
2827.63user 636.22system 30:03.28elapsed 192%CPU (0avgtext+0avgdata 16676maxresident)k
352inputs+51874752outputs (5major+5466338minor)pagefaults 0swaps
./gen-with-text_template-goroutines-nochan generate-with-template gotext.tpl fields.yml -c configs.yml -y gotext -t 30G
2018.64user 172.34system 31:27.58elapsed 116%CPU (0avgtext+0avgdata 15556maxresident)k
104inputs+52320520outputs (6major+2643605minor)pagefaults 0swaps
./gen-with-text_template-main generate-with-template gotext.tpl fields.yml -c configs.yml -y gotext -t 30G
3043.93user 633.86system 33:16.69elapsed 184%CPU (0avgtext+0avgdata 16052maxresident)k
5968inputs+58594376outputs (43major+6518558minor)pagefaults 0swaps
in this case the performance between gen-with-text_template-main
and gen-with-text_template-goroutines-globalstate
seems very similar. the 3 minutes longer taken by gen-with-text_template-main
are related to generating more events
while the final goal of improving the performance was not reached I would keep a few elements of the refactoring:
error
return from EmitF
emit()
rather than on the bind functions)
main
branch:this branch: