elastic / elastic-integration-corpus-generator-tool

Command line tool used for generating events corpus dynamically given a specific integration
Other
21 stars 12 forks source link

extend enum support #144

Closed gpop63 closed 3 months ago

gpop63 commented 4 months ago

Extended enum support for:

How I tested it

Using aws.billing schema-b from assets/templates.

Added a random new field in fields.yml:

- name: aws.billing.group_by.TEST
  type: double

Added only test in aws.billing.group_definition.key in configs.yml.

  - name: aws.billing.group_definition.key
    # NOTE: repeated values are needed to produce 10% cases with "" value
    enum: ["TEST"]

And some enum values for TEST:

  - name: aws.billing.group_by.TEST
    enum: [50.32, 32.25]

Added an extra else if in gotext.tpl.

{{- else if eq $groupBy "TEST"}}
              "TEST": {{generate "aws.billing.group_by.TEST"}}

Command to run:

go run main.go generate-with-template ./assets/templates/aws.billing/schema-b/gotext.tpl ./assets/templates/aws.billing/schema-b/fields.yml --config-file ./assets/templates/aws.billing/schema-b/configs.yml --tot-events 1000

Closes #142

gpop63 commented 3 months ago

@shmsr the tests sometimes fail because the time generator is generating a timestamp that is slightly before the current time. Maybe we should look into this as well to fix it. It can be reproduced locally. Rerunning them a few times will make them pass.

?       [github.com/elastic/elastic-integration-corpus-generator-tool](http://github.com/elastic/elastic-integration-corpus-generator-tool)    [no test files]
?       [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/fields](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/fields)  [no test files]
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/cmd](http://github.com/elastic/elastic-integration-corpus-generator-tool/cmd)        0.004s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/corpus](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/corpus)    0.003s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/settings](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/settings)  0.003s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/version](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/version)   0.002s
2024/05/05 23:10:54 rand seed generator initialised with value `4029003210993349982`
2024/05/05 23:10:54 time now generator initialised with value `2024-05-05T20:10:54.918326267Z`
--- FAIL: Test_FieldDateWithCustomTemplate (0.00s)
    generator_with_custom_template_test.go:442: with template: {"alpha":"{{.alpha}}"}
    generator_with_custom_template_test.go:455: Data generated before now, diff: -267ns
FAIL
FAIL    [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib) 1.753s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/config](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/config)  0.005s
FAIL
shmsr commented 3 months ago

@shmsr the tests sometimes fail because the time generator is generating a timestamp that is slightly before the current time. Maybe we should look into this as well to fix it. It can be reproduced locally. Rerunning them a few times will make them pass.

?       [github.com/elastic/elastic-integration-corpus-generator-tool](http://github.com/elastic/elastic-integration-corpus-generator-tool)    [no test files]
?       [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/fields](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/fields)  [no test files]
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/cmd](http://github.com/elastic/elastic-integration-corpus-generator-tool/cmd)        0.004s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/corpus](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/corpus)    0.003s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/settings](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/settings)  0.003s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/internal/version](http://github.com/elastic/elastic-integration-corpus-generator-tool/internal/version)   0.002s
2024/05/05 23:10:54 rand seed generator initialised with value `4029003210993349982`
2024/05/05 23:10:54 time now generator initialised with value `2024-05-05T20:10:54.918326267Z`
--- FAIL: Test_FieldDateWithCustomTemplate (0.00s)
    generator_with_custom_template_test.go:442: with template: {"alpha":"{{.alpha}}"}
    generator_with_custom_template_test.go:455: Data generated before now, diff: -267ns
FAIL
FAIL    [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib) 1.753s
ok      [github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/config](http://github.com/elastic/elastic-integration-corpus-generator-tool/pkg/genlib/config)  0.005s
FAIL

We can use this: https://github.com/elastic/beats/blob/main/x-pack/filebeat/input/salesforce/helper.go

gpop63 commented 3 months ago

We can use this: https://github.com/elastic/beats/blob/main/x-pack/filebeat/input/salesforce/helper.go

@shmsr Not sure if it's that straightforward. We would have to mock it in multiple places. Or maybe even mock the generator?

testSingleTWithCustomTemplate is being called in the test https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/1676ff91027a9d5455943547911267ed9700e4bf/pkg/genlib/generator_with_custom_template_test.go#L946

which uses an actual implementation of a generator that calls the emit function to create timestamps

https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/1676ff91027a9d5455943547911267ed9700e4bf/pkg/genlib/generator_with_custom_template.go#L133

the actual func that creates the timestamps is this one https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/1676ff91027a9d5455943547911267ed9700e4bf/pkg/genlib/generator_interface.go#L505

Maybe we can add a bit of tolerance and change the test to something like this

tolerance := time.Millisecond
if diff < -tolerance || diff > FieldTypeDurationSpan*time.Millisecond+tolerance {
    t.Errorf("Data generated before now, diff: %v", diff)
}

or we could truncate previous

previous := timeNowToBind.Truncate(time.Millisecond)
shmsr commented 3 months ago

Changes related to enum look more or less good. Here's a diff patch but only refer enum related changes as there a few more unrelated changes. For long, I think we should go with ParseInt instead of ParseFloat.

diff --git a/pkg/genlib/generator_interface.go b/pkg/genlib/generator_interface.go
index 5acd7db..4a56c2f 100644
--- a/pkg/genlib/generator_interface.go
+++ b/pkg/genlib/generator_interface.go
@@ -99,7 +99,6 @@ func newGenState() *genState {
 }

 func bindField(cfg Config, field Field, fieldMap map[string]any, withReturn bool) error {
-
    // Check for hardcoded field value
    if len(field.Value) > 0 {
        if withReturn {
@@ -164,7 +163,6 @@ func isDupeInterface(va []any, dst any) bool {
 }

 func bindByType(cfg Config, field Field, fieldMap map[string]any) (err error) {
-
    fieldCfg, _ := cfg.GetField(field.Name)

    switch field.Type {
@@ -194,7 +192,6 @@ func bindByType(cfg Config, field Field, fieldMap map[string]any) (err error) {
 }

 func bindByTypeWithReturn(cfg Config, field Field, fieldMap map[string]any) (err error) {
-
    fieldCfg, _ := cfg.GetField(field.Name)

    switch field.Type {
@@ -300,7 +297,6 @@ func bindObject(cfg Config, fieldCfg ConfigField, field Field, fieldMap map[stri
 }

 func bindDynamicObject(cfg Config, field Field, fieldMap map[string]any) error {
-
    // Temporary fieldMap which we pass to the bind function,
    // then extract the generated emitFunction for use in the stub.
    dynMap := make(map[string]any)
@@ -315,7 +311,6 @@ func bindDynamicObject(cfg Config, field Field, fieldMap map[string]any) error {
 }

 func genNounsN(n int, buf *bytes.Buffer) {
-
    for i := 0; i < n-1; i++ {
        buf.WriteString(randomdata.Noun())
        buf.WriteByte(' ')
@@ -414,6 +409,7 @@ func totWordsAndJoiner(fieldExample string) (int, string) {

    return totWords, joiner
 }
+
 func bindJoinRand(field Field, N int, joiner string, fieldMap map[string]any) error {
    var emitFNotReturn emitFNotReturn
    emitFNotReturn = func(state *genState, buf *bytes.Buffer) error {
@@ -518,7 +514,6 @@ func nearTime(fieldCfg ConfigField, state *genState) time.Time {
        } else {
            fieldCfg.Period = timeNowToBind.UTC().Sub(from.UTC())
        }
-
    }

    if errFrom != nil && errTo == nil {
@@ -735,7 +730,6 @@ func bindDouble(fieldCfg ConfigField, field Field, fieldMap map[string]any) erro
 }

 func bindCardinality(cfg Config, field Field, fieldMap map[string]any) error {
-
    fieldCfg, _ := cfg.GetField(field.Name)
    cardinality := fieldCfg.Cardinality

@@ -958,6 +952,7 @@ func bindIPWithReturn(field Field, fieldMap map[string]any) error {
    fieldMap[field.Name] = emitF
    return nil
 }
+
 func randIP() (int, int, int, int) {
    i0 := customRand.Intn(255)
    i1 := customRand.Intn(255)
@@ -966,27 +961,29 @@ func randIP() (int, int, int, int) {

    return i0, i1, i2, i3
 }
+
 func bindLongWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string]any) error {
    if err := fieldCfg.ValidCounter(); err != nil {
        return err
    }

+   // If an enum is provided, use it to generate random values from the enum
    if len(fieldCfg.Enum) > 0 {
        var emitF emitF
        emitF = func(state *genState) any {
            idx := customRand.Intn(len(fieldCfg.Enum))
-           f, err := strconv.ParseFloat(fieldCfg.Enum[idx], 64)
+           f, err := strconv.ParseInt(fieldCfg.Enum[idx], 10, 64)
            if err != nil {
                return err
            }
-           return int64(f)
+           return f
        }

        fieldMap[field.Name] = emitF
-
        return nil
    }

+   // If the field is marked as a counter, generate sequential values
    if fieldCfg.Counter {
        var emitF emitF

@@ -1001,7 +998,6 @@ func bindLongWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string]a

            if fieldCfg.Fuzziness <= 0 {
                dummyFunc = makeIntCounterFunc(previous, field)
-
                dummyInt = dummyFunc()
            } else {
                dummyInt = fuzzyIntCounter(previous, fieldCfg.Fuzziness)
@@ -1012,10 +1008,10 @@ func bindLongWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string]a
        }

        fieldMap[field.Name] = emitF
-
        return nil
    }

+   // If the field is not a counter, generate random values within the specified range
    dummyFunc := makeIntFunc(fieldCfg, field)

    if fieldCfg.Fuzziness <= 0 {
@@ -1090,37 +1086,29 @@ func bindDoubleWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string
    }

    if len(fieldCfg.Enum) > 0 {
-       var emitF emitF
-       emitF = func(state *genState) any {
+       emitF := func(state *genState) any {
            idx := customRand.Intn(len(fieldCfg.Enum))
            f, err := strconv.ParseFloat(fieldCfg.Enum[idx], 64)
            if err != nil {
-               return err
+               // Handle parsing error, e.g., log or return a default value
+               return 0.0
            }
-
            return f
        }
-
        fieldMap[field.Name] = emitF
-
        return nil
    }

    if fieldCfg.Counter {
-       var emitF emitF
-
-       emitF = func(state *genState) any {
+       emitF := func(state *genState) any {
            previous := float64(1)
-           var dummyFloat float64
-           var dummyFunc func() float64
-
            if previousDummyFloat, ok := state.prevCache[field.Name].(float64); ok {
                previous = previousDummyFloat
            }

+           var dummyFloat float64
            if fieldCfg.Fuzziness <= 0 {
-               dummyFunc = makeFloatCounterFunc(previous, field)
-
+               dummyFunc := makeFloatCounterFunc(previous, field)
                dummyFloat = dummyFunc()
            } else {
                dummyFloat = fuzzyFloatCounter(previous, fieldCfg.Fuzziness)
@@ -1129,9 +1117,7 @@ func bindDoubleWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string
            state.prevCache[field.Name] = dummyFloat
            return dummyFloat
        }
-
        fieldMap[field.Name] = emitF
-
        return nil
    }

@@ -1142,17 +1128,23 @@ func bindDoubleWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string
        emitF = func(state *genState) any {
            return dummyFunc()
        }
-
        fieldMap[field.Name] = emitF

        return nil
    }

-   min, _ := fieldCfg.Range.MinAsFloat64()
-   max, _ := fieldCfg.Range.MaxAsFloat64()
+   min, err := fieldCfg.Range.MinAsFloat64()
+   if err != nil {
+       // Handle error
+       return err
+   }
+   max, err := fieldCfg.Range.MaxAsFloat64()
+   if err != nil {
+       // Handle error
+       return err
+   }

-   var emitF emitF
-   emitF = func(state *genState) any {
+   emitF := func(state *genState) any {
        var dummyFloat float64
        if previousDummyFloat, ok := state.prevCache[field.Name].(float64); ok {
            dummyFloat = fuzzyFloat(previousDummyFloat, fieldCfg.Fuzziness, min, max)
@@ -1164,12 +1156,10 @@ func bindDoubleWithReturn(fieldCfg ConfigField, field Field, fieldMap map[string
    }

    fieldMap[field.Name] = emitF
-
    return nil
 }

 func bindCardinalityWithReturn(cfg Config, field Field, fieldMap map[string]any) error {
-
    fieldCfg, _ := cfg.GetField(field.Name)
    cardinality := fieldCfg.Cardinality

@@ -1245,7 +1235,6 @@ func bindObjectWithReturn(cfg Config, fieldCfg ConfigField, field Field, fieldMa
 }

 func bindDynamicObjectWithReturn(cfg Config, field Field, fieldMap map[string]any) error {
-
    // Temporary fieldMap which we pass to the bind function,
    // then extract the generated emitFunction for use in the stub.
    dynMap := make(map[string]any)
shmsr commented 3 months ago

We can use this: https://github.com/elastic/beats/blob/main/x-pack/filebeat/input/salesforce/helper.go

@shmsr Not sure if it's that straightforward. We would have to mock it in multiple places. Or maybe even mock the generator?

testSingleTWithCustomTemplate is being called in the test

https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/1676ff91027a9d5455943547911267ed9700e4bf/pkg/genlib/generator_with_custom_template_test.go#L946

which uses an actual implementation of a generator that calls the emit function to create timestamps

https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/1676ff91027a9d5455943547911267ed9700e4bf/pkg/genlib/generator_with_custom_template.go#L133

the actual func that creates the timestamps is this one

https://github.com/elastic/elastic-integration-corpus-generator-tool/blob/1676ff91027a9d5455943547911267ed9700e4bf/pkg/genlib/generator_interface.go#L505

Maybe we can add a bit of tolerance and change the test to something like this

tolerance := time.Millisecond
if diff < -tolerance || diff > FieldTypeDurationSpan*time.Millisecond+tolerance {
    t.Errorf("Data generated before now, diff: %v", diff)
}

or we could truncate previous

previous := timeNowToBind.Truncate(time.Millisecond)

Hey, thanks. I did not take a deeper look. For now, let's add tolerance like you suggested but let's do it properly in upcoming enhancements that we are going to make. Please a NOTE comment if we are going with the tolerance change or the truncation.

@devamanv What do you suggest?

shmsr commented 3 months ago

Also, from what I see there are so many code smells in the repo. We will file a separate PR to address most of the code smells as a single PR reported by staticcheck, golangci-lint, revive, vet, etc. Also need to gofumpt the code. But we will do it separately PR.

gpop63 commented 3 months ago

@shmsr

I just checked and we should move the error check outside of emitF because if there is an error happening we would just return the error string as the value.

"group_by": {
              "TEST": strconv.ParseInt: parsing "44.32": invalid syntax
            }

Not sure if we should return a default value like 0 if parsing fails, if we move it outside of emitF we can just return and stop the process - signaling to the user that the config is wrong.

shmsr commented 3 months ago

@shmsr

I just checked and we should move the error check outside of emitF because if there is an error happening we would just return the error string as the value.

"group_by": {
              "TEST": strconv.ParseInt: parsing "44.32": invalid syntax
            }

Not sure if we should return a default value like 0 if parsing fails, if we move it outside of emitF we can just return and stop the process - signaling to the user that the config is wrong.

Sorry for the late reply. I agree. We have to move the err check outside or do something else.

shmsr commented 3 months ago

@gpop63 Looks like CI failed. Here: https://github.com/elastic/elastic-integration-corpus-generator-tool/actions/runs/9349347197

gpop63 commented 3 months ago

@shmsr Yeah it's the same test we discussed previously https://github.com/elastic/elastic-integration-corpus-generator-tool/pull/144#issuecomment-2121056761 --- FAIL: Test_FieldDateWithCustomTemplate (0.01s). Let's create an issue to address this test.

shmsr commented 3 months ago

Oh, ok. Thanks. I'll create one if you haven't already opened one.