elastic / elastic-integration-corpus-generator-tool

Command line tool used for generating events corpus dynamically given a specific integration
Other
21 stars 12 forks source link

Values tending to the minimum value when providing `fuzziness` config #127

Open aliabbas-elastic opened 7 months ago

aliabbas-elastic commented 7 months ago

Attaching the configs used:-

config.yml

fields:
  - name: 'timestamp'
    period: -24h # one day
  - name: agent.id
    value: "ef5e274d-4b53-45e6-943a-a5bcf1a6f523"
  - name: service.address
    enum: ["elastic-package-service-nginx-1","elastic-package-service-nginx-2","elastic-package-service-nginx-3"]
  - name: event.duration
    range:
      min: 1
      max: 1000
  - name: writing
    range:
      min: 1
      max: 100
    fuzziness: 0.6

Attaching the subset corpus generated for 200 events generated. Find the field writing in the corpus generated and it's values generated after a certain set of documents generated.

{    "@timestamp": "2024-01-10T21:38:23.410Z",    "agent": {        "ephemeral_id": "ivorybite",        "id": "ef5e274d-4b53-45e6-943a-a5bcf1a6f523",        "name": "zincbow",        "type": "metricbeat",        "version": "8.8.0"    },    "data_stream": {        "dataset": "nginx.stubstatus",        "namespace": "ep",        "type": "metrics"    },    "ecs": {        "version": "8.5.1"    },    "elastic_agent": {        "id": "ivorybite",        "snapshot": false,        "version": "8.8.0"    },    "event": {        "agent_id_status": "verified",        "dataset": "nginx.stubstatus",        "duration": 58,        "module": "nginx"    },     "host": {        "architecture": "x86_64",        "containerized": false,        "hostname": "docker-fleet-agent",        "id": "66392b0697b84641af8006d87aeb89f1",        "ip": [            "172.18.0.7"        ],        "mac": [            "02-42-AC-12-00-07"        ],        "name": "docker-fleet-agent",        "os": {            "codename": "focal",            "family": "debian",            "kernel": "5.15.49-linuxkit",            "name": "Ubuntu",            "platform": "ubuntu",            "type": "linux",            "version": "20.04.5 LTS (Focal Fossa)"        }    },    "metricset": {        "name": "stubstatus",        "period": 10000    },    "nginx": {        "stubstatus": {            "writing": 1        }    },    "service": {        "address": "http://elastic-package-service-nginx-3:80/server-status",        "type": "nginx"    }}
{    "@timestamp": "2024-01-10T21:45:35.410Z",    "agent": {        "ephemeral_id": "chestnutpanther",        "id": "ef5e274d-4b53-45e6-943a-a5bcf1a6f523",        "name": "rattleoriole",        "type": "metricbeat",        "version": "8.8.0"    },    "data_stream": {        "dataset": "nginx.stubstatus",        "namespace": "ep",        "type": "metrics"    },    "ecs": {        "version": "8.5.1"    },    "elastic_agent": {        "id": "chestnutpanther",        "snapshot": false,        "version": "8.8.0"    },    "event": {        "agent_id_status": "verified",        "dataset": "nginx.stubstatus",        "duration": 472,        "module": "nginx"    },     "host": {        "architecture": "x86_64",        "containerized": false,        "hostname": "docker-fleet-agent",        "id": "66392b0697b84641af8006d87aeb89f1",        "ip": [            "172.18.0.7"        ],        "mac": [            "02-42-AC-12-00-07"        ],        "name": "docker-fleet-agent",        "os": {            "codename": "focal",            "family": "debian",            "kernel": "5.15.49-linuxkit",            "name": "Ubuntu",            "platform": "ubuntu",            "type": "linux",            "version": "20.04.5 LTS (Focal Fossa)"        }    },    "metricset": {        "name": "stubstatus",        "period": 10000    },    "nginx": {        "stubstatus": {            "writing": 1        }    },    "service": {        "address": "http://elastic-package-service-nginx-2:80/server-status",        "type": "nginx"    }}
aspacca commented 7 months ago

hi @aliabbas-elastic , I can confirm that I've observed the following generic behaviour: fields with a range leaning towards the minimum bound of the range

i think that what you are observing is a corner case of the above when rounding integer ranges with a fuzziness applied. basically you reach a point where the "current" integer value reach the minimum bound of the range, 1 in your case. in the next generation cycle, given fuzziness: 0.6 and range.min: 1, the value that can be generated is 1 < x < 1.6 since internally the value are generated as float and than rounded to integer you end up having 1 again, and so on for all the every following generation cycle

the problem can be probably mitigated with a properly chosen mix of range.min and fuzziness:

a = range.min
b = range.min * fuzziness
b > a + 1

I see two options in order to deal with the problem directly in the code:

in general, without the extreme outcome of ending up generating the same value after a certain number of events, we should investigate the tendency for random numeric values to lean towards 0. i'm not an expert: not sure if it's a limit of the specific go random package or something more generic to randomness on computer, and/or it is a known limit that can be mitigated with proper approach