elastic / elastic-integration-corpus-generator-tool

Command line tool used for generating events corpus dynamically given a specific integration
Other
22 stars 12 forks source link

generator with `text/template` package #37

Closed endorama closed 1 year ago

endorama commented 1 year ago

This is a test implementation leveraging text/template as template engine for data generation.

elasticmachine commented 1 year ago

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

#### Build stats * Start Time: 2022-12-13T11:13:28.461+0000 * Duration: 3 min 42 sec #### Test stats :test_tube: | Test | Results | | ------------ | :-----------------------------: | | Failed | 0 | | Passed | 103 | | Skipped | 0 | | Total | 103 |

:robot: GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with: - `/test` : Re-trigger the build.

endorama commented 1 year ago

My latest benchmark is:

❯ benchstat mer\ \ 7\ dic\ 2022,\ 10.22.46,\ CET.txt
name                          time/op
_Gen2-12                      64.8µs ± 9%
_Gen3-12                      15.3µs ± 7%
_Generator-12                 39.9µs ± 6%
_GeneratorWithTemplate-12     39.3µs ± 8%
_GeneratorWithTemplateAWS-12  8.74µs ± 7%

name                          alloc/op
_Gen2-12                      10.8kB ± 0%
_Gen3-12                      2.94kB ± 0%
_Generator-12                   482B ± 0%
_GeneratorWithTemplate-12       482B ± 0%
_GeneratorWithTemplateAWS-12    160B ± 0%

name                          allocs/op
_Gen2-12                         219 ± 0%
_Gen3-12                        97.0 ± 0%
_Generator-12                   15.0 ± 0%
_GeneratorWithTemplate-12       15.2 ± 5%
_GeneratorWithTemplateAWS-12    5.00 ± 0%

Gen2 does not have acceptable performances (CPU is okish but memory is not).
Gen3 is performant enough (better than Generator CPU wise but costs more memory), considering that offers the flexibility of using text/template but with the drawback of passing always empty dupes.

We can evaluate other template engines, there are "compile to go" ones that have extreme performance improvements. The trade off here would be the need to implement templates before hand, but for known cases it's worth a try.

ruflin commented 1 year ago

I did just run this and I was slightly surprised by the destionation:

./elastic-integration-corpus-generator-tool generate-with-template assets/templates/aws.vpcflow/vpcflow.log assets/templates/aws.vpcflow/vpcflow.fields.yml --tot-size=2000
File generated: /Users/ruflin/Library/Application Support/elastic-integration-corpus-generator-tool/corpora/1670945614-vpcflow.log

The binary itself is inside ruflin/Dev/.... What does it push to Application Support directory?

endorama commented 1 year ago

@ruflin I'm not sure about the folder, this PR should not change that behaviour. I got surprised too when I run it, but I didn't dig into why it behaves like that.

aspacca commented 1 year ago

The binary itself is inside ruflin/Dev/.... What does it push to Application Support directory?

I borrowed xdg usage from the changelog tool from @endorama (that I used to scaffold this project) :)

the default path the the corpa location is here https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/main/internal/settings/settings.go#L32-L34

it can be customised by env variable as like here: https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/main/internal/settings/xdg_test.go#L99

we can change the behaviour in case