elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
187 stars 397 forks source link

Request for adding data generation for system package #6168

Open maryam-saeidi opened 1 year ago

maryam-saeidi commented 1 year ago

Summary

As an Actionable Observability team member, I am looking for a way to generate system data (according to what metricbeat/elastic agent does) to test infra-alert rules. At the moment, I am using high_cardinality_indexer to generate that according to fake_host template.

This approach has the following challenges:

  1. In case of any change in metricbeat/elastic agent, this fake data will get out of sync
  2. At the moment, we need to know all the existing fields to generate fake data with enough information. Having a system package that generates data out of the box helps eliminate the need for the test writer to be aware of all the existing fields.

My end goal is to use this tool in Kibana for API integration testing of infra-alert rules.

Related topic

ruflin commented 1 year ago

@andresrc @aspacca @bturquet @tommyers-elastic As soon as we have the implementation of the corpus spec done in elastic-package, we should move over the existing specs and then tackle the system integration, at least a subset of the datasets as I expect this to be one of the most request datasets for testing.

@maryam-saeidi Can you share a bit more details on what the exact metrics are you are interested in. From the links you shared, it seems to be cpu and network? Others?

maryam-saeidi commented 1 year ago

@ruflin At the moment, I would like to test adding a condition for CPU usage and getting alerts related to the hosts that met that threshold. Suppose I have three hosts and only one of them has CPU usage above 90 (host-1: 50, host-2: 20, host-3: 95). Then I expect the alert document to have host.name: host-3 and related host information according to https://www.elastic.co/guide/en/ecs/current/ecs-host.html

Now my question is: How do ECS host fields relate to system integration? Can I expect all those fields to be added?

Also, regarding what system fields we can alert based on, I checked [release-oblt](https://release-oblt.kb.us-west2.gcp.elastic-cloud.com/app/observability/alerts/rules?_a=(lastResponse:!(),params:(),search:%27%27,status:!(),type:!())) (Create rule > Metric threshold > condition field), and I see we have a lot of system fields:

image

But for my test, CPU, Memory, and Network fields are enough to start (plus the ECS host fields if it is applicable)

ruflin commented 1 year ago

Now my question is: How do ECS host fields relate to system integration? Can I expect all those fields to be added?

Yes

aspacca commented 1 year ago

Now my question is: How do ECS host fields relate to system integration? Can I expect all those fields to be added?

this is true for schema-c:

{ "@timestamp": "2023-05-15T17:35:06.228332+09:00","agent.id": "rapidfriend","cloud.account.id": "azurecowl","cloud.availability_zone": "sage-raver-pearweasel","cloud.image.id": "blueeater","cloud.instance.id": "taker-sulpherhead","cloud.instance.name": "dirtridge","cloud.machine.type": "eatergossamerknife","cloud.project.id": "battleforger","cloud.provider": "hazelfairy","cloud.region": "mustang-flier-oilwhip","container.id": "quartzfalcon","container.image.name": "liefalcon","container.labels.belly": "jellyleg","container.labels.hand": "nebulacougar","container.labels.hyena": "cypressminnow","container.name": "grovesnout","data_stream.dataset": "honeysucklestallion","data_stream.namespace": "muckdeer","data_stream.type": "planetdevourer","event.dataset": "system.cpu","event.module": "system","host.architecture": "stealer_translucenthyena","host.containerized": true,"host.cpu.pct": 4.155775,"host.domain": "crackox","host.hostname": "sunsettrader","host.id": "meadowcarpet","host.ip": "182.39.195.123","host.mac": "streamocelot","host.name": "stripedive","host.os.build": "nimblesparrow","host.os.codename": "ceruleanbug","host.os.family": "timefrill","host.os.full": "coconutcharger","host.os.kernel": "scowl-salmon-belly-chiller-rootgrasp","host.os.name": "runner scourge leathergem","host.os.platform": "motleyjay","host.os.version": "scorpionstalkerbigmark","host.type": "feathercrafter","system.cpu.cores": 4,"system.cpu.idle.norm.pct": 4.602111,"system.cpu.idle.pct": 2.933579,"system.cpu.idle.ticks": 3,"system.cpu.iowait.norm.pct": 0.198540,"system.cpu.iowait.pct": 6.837662,"system.cpu.iowait.ticks": 6,"system.cpu.irq.norm.pct": 7.056231,"system.cpu.irq.pct": 6.143894,"system.cpu.irq.ticks": 4,"system.cpu.nice.norm.pct": 5.907551,"system.cpu.nice.pct": 0.178689,"system.cpu.nice.ticks": 3,"system.cpu.softirq.norm.pct": 3.495727,"system.cpu.softirq.pct": 8.562177,"system.cpu.softirq.ticks": 6,"system.cpu.steal.norm.pct": 1.507343,"system.cpu.steal.pct": 8.160910,"system.cpu.steal.ticks": 4,"system.cpu.system.norm.pct": 5.321610,"system.cpu.system.pct": 2.223324,"system.cpu.system.ticks": 6,"system.cpu.total.norm.pct": 4.868852,"system.cpu.total.pct": 8.012242,"system.cpu.user.norm.pct": 4.213471,"system.cpu.user.pct": 3.027456,"system.cpu.user.ticks": 1 }

I indeed have to investigate if the tool supports ECS fields coming from https://github.com/elastic/integrations/blob/main/packages/system/data_stream/cpu/fields/ecs.yml, or they are in the output because they are defined as well in https://github.com/elastic/integrations/blob/main/packages/system/data_stream/cpu/fields/agent.yml

if you want to generate schema-c data (ie: post-ingest pipeline, it does mean you should disable the ingest pipeline when ingesting in metrics-system.cpu-default), you don't need anything else than launching the tool with the following argument: generate system cpu 1.28.0 -t 200KB (please change according to the size you need).

please, beware, as discussed, that unless you are able to tweak the data to be generated trough the fields generation configuration so that they will trigger the rule you want to test, that you cannot be sure that the data generated will contain events that will trigger that rule.

for that https://github.com/elastic/geneve is a better tool, but as more limit regarding the generation of all the fields of the document. I think there is some way to generate the fields affecting the rule as well the ECS one through geneve, @cavokz might be more helpful here

cavokz commented 1 year ago

Thanks @aspacca. Ccing @charlie-pichette.

@maryam-saeidi, Geneve is not very good for generating realistic data, neither in the fields of the generated documents nor in the content of such fields. What Geneve is good for is adding fields mentioned in a query and put there content that would satisfy said query and therefore a rule.

If for example you have this query (not sure if I got the units right here):

any where host.cpu.usage >= 0.90 and _cardinality(host.name, 3)

You would get something similar to

{'host': {'cpu': {'usage': 0.9496416550389374}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.940+02:00'}
{'host': {'cpu': {'usage': 0.9043770541301733}, 'name': 'sgV'}, '@timestamp': '2023-05-15T11:28:54.940+02:00'}
{'host': {'cpu': {'usage': 0.9089471310908367}, 'name': 'SzF'}, '@timestamp': '2023-05-15T11:28:54.940+02:00'}
{'host': {'cpu': {'usage': 0.918347364858316}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.940+02:00'}
{'host': {'cpu': {'usage': 0.913752499159961}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.9687020191511078}, 'name': 'SzF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.952194248562828}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.9972572771906527}, 'name': 'SzF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.9790489383951492}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
{'host': {'cpu': {'usage': 0.9587031853062025}, 'name': 'FEF'}, '@timestamp': '2023-05-15T11:28:54.941+02:00'}
...

You see that aside for @timestamp no other fields are generated. host.cpu.name is random garbage but with _cardinality(host.name, 3) you just get three of these. host.cpu.usage will contain random numbers between 0.9 and 1.0 (inclusive).

If this is something that interests you, we need to find the way to integrate Geneve with tools that generate better "background" data on top of which Geneve can adjust/add the fields as needed.

ruflin commented 1 year ago

Elastic has quite a few data generation tool out there. As in many observability cases, the data we are interested in comes from packages, I rather focus for system metrics on the data generated by elastic-package and extending it for the use cases then extending geneve.

cavokz commented 1 year ago

Indeed I was thinking at integrating Geneve with other tools more than extending it.

For instance we already evaluated the idea of adding support for package-integrations in Geneve (https://github.com/elastic/geneve/issues/113) and concluded that it's not a good idea.

charlie-pichette commented 1 year ago

@maryam-saeidi https://github.com/elastic/logen may also be of value.

maryam-saeidi commented 1 year ago

@charlie-pichette I get 404 when I try to access the repo

charlie-pichette commented 1 year ago

Perhaps @tammytorbert can provide access to Logen.

aspacca commented 1 year ago

@ruflin

I rather focus for system metrics on the data generated by elastic-package and extending it for the use cases then extending geneve.

we will for sure create the assets for the system metrics in elastic-package, still for the use case of @maryam-saeidi it might not be the right solution because of the inability about creating data triggering a rule

@cavokz

we need to find the way to integrate Geneve with tools that generate better "background" data on top of which Geneve can adjust/add the fields as needed.

as @ruflin mentioned, in the context of observability "the data we are interested in comes from packages", and that's what the corpus generator tool handles very well

but it misses the way to drive data according to a query/rule across multiple events

we talked while ago about having the two tools somehow be able to "speak each others" and I see @maryam-saeidi's scenario a good one where we could start building upon: what do you think?

botelastic[bot] commented 2 months ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

ruflin commented 2 months ago

@lalit-satapathy ^ Would be great to get this in as it would help with development and testing.

lalit-satapathy commented 2 months ago

@lalit-satapathy ^ Would be great to get this in as it would help with development and testing.

Yes, will help on this.

But for my test, CPU, Memory, and Network fields are enough to start (plus the ECS host fields if it is applicable)

@maryam-saeidi, We already have the rally benchmark supported for system.cpu and system.memory. Is this something you can give a try and we can extend to system.network in future? If you need help running corpus generator tool, please let's know.