elastic / elastic-integration-corpus-generator-tool

Command line tool used for generating events corpus dynamically given a specific integration
Other
21 stars 12 forks source link

Simplify config #149

Open gpop63 opened 2 months ago

gpop63 commented 2 months ago

Overview

This change simplifies the configuration by making the fields.yml file optional, as long as the config provides fields definitions — this is being checked in the command.

A new FieldMapping struct that matches the Field struct has been added in config. If a path to a fields definitions file is provided, the generator uses the field definitions from that file. Otherwise, the field definitions are created directly from the config file.

The Value field inside the Field struct has been changed from string to any. This eliminates the need for type conversion. Converting would be straightforward for integers and floats, but we can also have slices as values. Making the Value field match the type of the Value field from the Config struct seems like a more suitable approach.

How I tested

Used aws.ec2_metrics/schema-b.

configs.yml

```yaml fields: - name: dimensionType type: keyword # no dimension: 2.5%, AutoScalingGroupName: 10%, ImageId: 5%, InstanceType: 2.5%, InstanceId: 80% enum: ["", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "AutoScalingGroupName", "ImageId", "ImageId", "InstanceType", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId", "InstanceId"] cardinality: 600 # we want every single different "dimension identifier", regardless of its type, to have always the same generated fixed "metadata" once the cardinality kicks in # for this we must take the ordered highest enum length appending one by one the ones that does not have a 0 module between each others. # we start from the first two, multiple between their values and exclude from the order list the ones that have a 0 module on the result of the multiplication. # we end up with the list of enum lengths whose value, multiplied, define the least common multiple: this is the value we must use for the cardinality of all fields. # in this case the remaining enum are two: `dimensionType` (40) and `region` (15), resulting in cardinality `600` - name: Region type: keyword enum: ["ap-south-1", "eu-north-1", "eu-west-3", "eu-west-2", "eu-west-1", "ap-northeast-3", "ap-northeast-2", "ap-northeast-1", "ap-southeast-1", "ap-southeast-2", "eu-central-1", "us-east-1", "us-east-2", "us-west-1", "us-west-2"] cardinality: 600 - name: AutoScalingGroupName type: keyword cardinality: 600 - name: ImageId type: keyword cardinality: 600 - name: InstanceId type: keyword cardinality: 600 - name: instanceTypeIdx type: long # we generate and index for the instance type enums, so that all the information related to a given type are properly matched range: min: 0 max: 19 cardinality: 600 - name: InstanceType type: keyword value: ["a1.medium", "c3.2xlarge", "c4.4xlarge", "c5.9xlarge", "c5a.12xlarge", "c5ad.16xlarge", "c5d.24xlarge", "c6a.32xlarge", "g5.48xlarge", "d2.2xlarge", "d3.xlarge", "t2.medium", "t2.micro", "t2.nano", "t2.small", "t3.large", "t3.medium", "t3.micro", "t3.nano", "t3.small"] - name: instanceCoreCount type: keyword # they map instance types value: ["1", "4", "8", "18", "24", "32", "48", "64", "96", "4", "2", "2", "1", "1", "1", "1", "1", "1", "1", "1"] - name: instanceThreadPerCore type: keyword # they map instance types value: ["1", "2", "2", " 2", " 2", " 2", " 2", " 2", " 2", "2", "2", "1", "1", "1", "1", "2", "2", "2", "2", "2"] - name: instanceImageId type: keyword cardinality: 600 - name: instanceMonitoringState type: keyword # enable: 10%, disabled: 90% enum: ["enabled", "disabled", "disabled", "disabled", "disabled", "disabled", "disabled", "disabled", "disabled", "disabled"] cardinality: 600 - name: instancePrivateIP type: ip cardinality: 600 - name: instancePrivateDnsEmpty type: keyword # without private dns entry: 10%, with private dns entry: 90% enum: ["empty", "fromPrivateIP", "fromPrivateIP", "fromPrivateIP", "fromPrivateIP", "fromPrivateIP", "fromPrivateIP", "fromPrivateIP", "fromPrivateIP", "fromPrivateIP"] cardinality: 600 - name: instancePublicIP type: ip cardinality: 600 - name: instancePublicDnsEmpty type: keyword # without public dns entry: 20%, with public dns entry: 80% enum: ["empty", "fromPublicIP", "fromPublicIP", "fromPublicIP", "fromPublicIP"] cardinality: 600 - name: instanceStateName type: keyword # terminated: 10%, running: 90% enum: ["terminated", "running", "running", "running", "running", "running", "running", "running", "running", "running"] cardinality: 600 - name: cloudInstanceName type: keyword cardinality: 600 - name: StatusCheckFailed_InstanceAvg type: double range: min: 0 max: 10 fuzziness: 0.05 - name: StatusCheckFailed_SystemAvg type: double range: min: 0 max: 10 fuzziness: 0.05 - name: StatusCheckFailedAvg type: double range: min: 0 max: 10 fuzziness: 0.05 - name: CPUUtilizationAvg type: double range: min: 0 max: 100 fuzziness: 0.05 - name: NetworkPacketsInSum type: double range: min: 0 max: 1500000 fuzziness: 0.05 - name: NetworkPacketsOutSum type: double range: min: 0 max: 1500000 fuzziness: 0.05 - name: CPUCreditBalanceAvg type: double range: min: 0 max: 5000 fuzziness: 0.05 - name: CPUSurplusCreditBalanceAvg type: double range: min: 0 max: 5000 fuzziness: 0.05 - name: CPUSurplusCreditsChargedAvg type: double range: min: 0 max: 5000 fuzziness: 0.05 - name: CPUCreditUsageAvg type: double range: min: 0 max: 10 fuzziness: 0.05 - name: DiskReadBytesSum type: double range: min: 0 max: 1500000 fuzziness: 0.05 - name: DiskReadOpsSum type: double range: min: 0 max: 1000 fuzziness: 0.05 - name: DiskWriteBytesSum type: double range: min: 0 max: 1500000000 fuzziness: 0.05 - name: DiskWriteOpsSum type: double range: min: 0 max: 1000 fuzziness: 0.05 - name: EventDuration type: long range: min: 1 max: 1000 - name: partOfAutoScalingGroup type: long # we dived this value by 20 in the template, giving 20% chance to be part of an autoscaling group: in this case we append the related aws.tags range: min: 1 max: 100 - name: EventIngested type: date ```

go run main.go generate-with-template ./assets/templates/aws.ec2_metrics/schema-b/gotext.tpl --config-file ./assets/templates/aws.ec2_metrics/schema-b/configs.yml --tot-events 10

Closes: #148