elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
21 stars 435 forks source link

Remove `event.duration` and `event.ingested` from metric events #4894

Open ruflin opened 1 year ago

ruflin commented 1 year ago

In https://github.com/elastic/beats/issues/31574 and https://github.com/elastic/elasticsearch/pull/85649 it is discussed, that event.duration takes up a significant amount of disk space (~16%) even though the fields is not used in most scenarios. Ideally we stop shipping the field from where not needed and make it opt in.

On the integrations side, there are several ways to deal with the field:

@nik9000 @jpountz If we make the field a runtime field, will this already bring benefits?

Below I run the disk stats on a smaller set of disk metrics to get an overview of what storage is used:

metrics-system.memory-default disk usage POST metrics-system.memory-default/_disk_usage?run_expensive_tasks=true ``` ".ds-metrics-system.memory-default-2022.06.23-000019": { "store_size": "17.2mb", "store_size_in_bytes": 18137770, "all_fields": { "total": "17.1mb", "total_in_bytes": 17965798, "inverted_index": { "total": "1.3mb", "total_in_bytes": 1453187 }, "stored_fields": "6.6mb", "stored_fields_in_bytes": 6990103, "doc_values": "4.5mb", "doc_values_in_bytes": 4820991, "points": "4.4mb", "points_in_bytes": 4701517, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "fields": { "@timestamp": { "total": "395.1kb", "total_in_bytes": 404600, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "176.5kb", "doc_values_in_bytes": 180809, "points": "218.5kb", "points_in_bytes": 223791, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "_id": { "total": "919.4kb", "total_in_bytes": 941532, "inverted_index": { "total": "854kb", "total_in_bytes": 874597 }, "stored_fields": "65.3kb", "stored_fields_in_bytes": 66935, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "_primary_term": { "total": "3.4kb", "total_in_bytes": 3501, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "3.4kb", "doc_values_in_bytes": 3501, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "_seq_no": { "total": "224.7kb", "total_in_bytes": 230137, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "100.6kb", "doc_values_in_bytes": 103074, "points": "124kb", "points_in_bytes": 127063, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "_source": { "total": "6.6mb", "total_in_bytes": 6923168, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "6.6mb", "stored_fields_in_bytes": 6923168, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "_version": { "total": "0b", "total_in_bytes": 0, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "agent.ephemeral_id": { "total": "18.8kb", "total_in_bytes": 19308, "inverted_index": { "total": "12.1kb", "total_in_bytes": 12440 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "6.7kb", "doc_values_in_bytes": 6868, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "agent.id": { "total": "18.8kb", "total_in_bytes": 19309, "inverted_index": { "total": "12.1kb", "total_in_bytes": 12446 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "6.7kb", "doc_values_in_bytes": 6863, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "agent.name": { "total": "18.8kb", "total_in_bytes": 19297, "inverted_index": { "total": "12.2kb", "total_in_bytes": 12534 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "6.6kb", "doc_values_in_bytes": 6763, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "agent.type": { "total": "2.8kb", "total_in_bytes": 2910, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2855 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "55b", "doc_values_in_bytes": 55, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "agent.version": { "total": "2.7kb", "total_in_bytes": 2860, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2830 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "30b", "doc_values_in_bytes": 30, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "cloud.account.id": { "total": "2.9kb", "total_in_bytes": 3025, "inverted_index": { "total": "2.8kb", "total_in_bytes": 2915 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "110b", "doc_values_in_bytes": 110, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "cloud.availability_zone": { "total": "2.8kb", "total_in_bytes": 2940, "inverted_index": { "total": "2.8kb", "total_in_bytes": 2870 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "70b", "doc_values_in_bytes": 70, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "cloud.instance.id": { "total": "18.5kb", "total_in_bytes": 18975, "inverted_index": { "total": "11.9kb", "total_in_bytes": 12277 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "6.5kb", "doc_values_in_bytes": 6698, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "cloud.instance.name": { "total": "18.8kb", "total_in_bytes": 19297, "inverted_index": { "total": "12.2kb", "total_in_bytes": 12534 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "6.6kb", "doc_values_in_bytes": 6763, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "cloud.machine.type": { "total": "2.8kb", "total_in_bytes": 2940, "inverted_index": { "total": "2.8kb", "total_in_bytes": 2870 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "70b", "doc_values_in_bytes": 70, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "cloud.project.id": { "total": "2.9kb", "total_in_bytes": 3025, "inverted_index": { "total": "2.8kb", "total_in_bytes": 2915 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "110b", "doc_values_in_bytes": 110, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "cloud.provider": { "total": "2.7kb", "total_in_bytes": 2840, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2820 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "20b", "doc_values_in_bytes": 20, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "cloud.service.name": { "total": "2.7kb", "total_in_bytes": 2840, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2820 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "20b", "doc_values_in_bytes": 20, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "ecs.version": { "total": "2.7kb", "total_in_bytes": 2860, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2830 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "30b", "doc_values_in_bytes": 30, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "elastic_agent.id": { "total": "18.8kb", "total_in_bytes": 19310, "inverted_index": { "total": "12.1kb", "total_in_bytes": 12447 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "6.7kb", "doc_values_in_bytes": 6863, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "elastic_agent.snapshot": { "total": "2.7kb", "total_in_bytes": 2810, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2810 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "elastic_agent.version": { "total": "2.7kb", "total_in_bytes": 2860, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2830 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "30b", "doc_values_in_bytes": 30, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "event.agent_id_status": { "total": "2.9kb", "total_in_bytes": 3025, "inverted_index": { "total": "2.8kb", "total_in_bytes": 2915 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "110b", "doc_values_in_bytes": 110, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "event.duration": { "total": "345.6kb", "total_in_bytes": 353918, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "176.5kb", "doc_values_in_bytes": 180809, "points": "169kb", "points_in_bytes": 173109, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "event.ingested": { "total": "303.9kb", "total_in_bytes": 311196, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "124.8kb", "doc_values_in_bytes": 127851, "points": "179kb", "points_in_bytes": 183345, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.architecture": { "total": "2.8kb", "total_in_bytes": 2870, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2835 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "35b", "doc_values_in_bytes": 35, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.containerized": { "total": "2.7kb", "total_in_bytes": 2810, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2810 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.hostname": { "total": "18.8kb", "total_in_bytes": 19297, "inverted_index": { "total": "12.2kb", "total_in_bytes": 12534 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "6.6kb", "doc_values_in_bytes": 6763, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.ip": { "total": "4mb", "total_in_bytes": 4242077, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "1.6mb", "doc_values_in_bytes": 1778806, "points": "2.3mb", "points_in_bytes": 2463271, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.mac": { "total": "1.9mb", "total_in_bytes": 2042470, "inverted_index": { "total": "393.9kb", "total_in_bytes": 403375 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "1.5mb", "doc_values_in_bytes": 1639095, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.name": { "total": "18.8kb", "total_in_bytes": 19297, "inverted_index": { "total": "12.2kb", "total_in_bytes": 12534 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "6.6kb", "doc_values_in_bytes": 6763, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.os.codename": { "total": "2.7kb", "total_in_bytes": 2860, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2830 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "30b", "doc_values_in_bytes": 30, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.os.family": { "total": "2.8kb", "total_in_bytes": 2870, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2835 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "35b", "doc_values_in_bytes": 35, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.os.kernel": { "total": "2.8kb", "total_in_bytes": 2890, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2845 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "45b", "doc_values_in_bytes": 45, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.os.name": { "total": "2.8kb", "total_in_bytes": 2870, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2835 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "35b", "doc_values_in_bytes": 35, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.os.name.text": { "total": "4kb", "total_in_bytes": 4138, "inverted_index": { "total": "4kb", "total_in_bytes": 4138 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.os.platform": { "total": "2.8kb", "total_in_bytes": 2871, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2836 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "35b", "doc_values_in_bytes": 35, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.os.type": { "total": "2.7kb", "total_in_bytes": 2861, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2831 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "30b", "doc_values_in_bytes": 30, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "host.os.version": { "total": "2.9kb", "total_in_bytes": 3067, "inverted_index": { "total": "2.8kb", "total_in_bytes": 2937 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "130b", "doc_values_in_bytes": 130, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "metricset.name": { "total": "2.8kb", "total_in_bytes": 2873, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2838 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "35b", "doc_values_in_bytes": 35, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "metricset.period": { "total": "1.8kb", "total_in_bytes": 1859, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "1.8kb", "points_in_bytes": 1859, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "orchestrator.cluster.name": { "total": "2.8kb", "total_in_bytes": 2933, "inverted_index": { "total": "2.8kb", "total_in_bytes": 2868 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "65b", "doc_values_in_bytes": 65, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "orchestrator.cluster.url": { "total": "2.9kb", "total_in_bytes": 3018, "inverted_index": { "total": "2.8kb", "total_in_bytes": 2913 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "105b", "doc_values_in_bytes": 105, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "service.type": { "total": "2.8kb", "total_in_bytes": 2873, "inverted_index": { "total": "2.7kb", "total_in_bytes": 2838 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "35b", "doc_values_in_bytes": 35, "points": "0b", "points_in_bytes": 0, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.actual.free": { "total": "382.8kb", "total_in_bytes": 392013, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "126.5kb", "doc_values_in_bytes": 129558, "points": "256.3kb", "points_in_bytes": 262455, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.actual.used.bytes": { "total": "382.8kb", "total_in_bytes": 392067, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "126.5kb", "doc_values_in_bytes": 129558, "points": "256.3kb", "points_in_bytes": 262509, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.actual.used.pct": { "total": "155.6kb", "total_in_bytes": 159372, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "50.6kb", "doc_values_in_bytes": 51821, "points": "105kb", "points_in_bytes": 107551, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.cached": { "total": "377.3kb", "total_in_bytes": 386365, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "126.5kb", "doc_values_in_bytes": 129558, "points": "250.7kb", "points_in_bytes": 256807, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.free": { "total": "382.6kb", "total_in_bytes": 391803, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "126.5kb", "doc_values_in_bytes": 129558, "points": "256kb", "points_in_bytes": 262245, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.swap.free": { "total": "1.8kb", "total_in_bytes": 1855, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "1.8kb", "points_in_bytes": 1855, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.swap.total": { "total": "1.8kb", "total_in_bytes": 1858, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "1.8kb", "points_in_bytes": 1858, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.swap.used.bytes": { "total": "1.8kb", "total_in_bytes": 1858, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "1.8kb", "points_in_bytes": 1858, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.total": { "total": "1.8kb", "total_in_bytes": 1858, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "0b", "doc_values_in_bytes": 0, "points": "1.8kb", "points_in_bytes": 1858, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.used.bytes": { "total": "383.5kb", "total_in_bytes": 392758, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "126.5kb", "doc_values_in_bytes": 129558, "points": "257kb", "points_in_bytes": 263200, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 }, "system.memory.used.pct": { "total": "154.9kb", "total_in_bytes": 158704, "inverted_index": { "total": "0b", "total_in_bytes": 0 }, "stored_fields": "0b", "stored_fields_in_bytes": 0, "doc_values": "50.6kb", "doc_values_in_bytes": 51821, "points": "104.3kb", "points_in_bytes": 106883, "norms": "0b", "norms_in_bytes": 0, "term_vectors": "0b", "term_vectors_in_bytes": 0, "knn_vectors": "0b", "knn_vectors_in_bytes": 0 } } }, ```

Another field that stands out for me is event.ingested. This is currently added by the [final pipeline](final pipeline: https://github.com/elastic/kibana/blob/7.14/x-pack/plugins/fleet/server/constants/fleet_es_assets.ts#L38) but as discussed in https://github.com/elastic/integrations/issues/4462 should be optional. In the case of event.ingested, there should be way to disable (or enable) the final pipeline. (@amitkanfer @joshdover )

ruflin commented 1 year ago

Looking into package-spec, it seems runtime fields are not supported yet on specific fields: https://github.com/elastic/package-spec/issues/39 Would be great to see this moving forward.

martijnvg commented 1 year ago

If we make the field a runtime field, will this already bring benefits?

If we drop docvalues for this field and synthetic source is enabled then this can't a runtime field. Because there is no source where the field can be generated from at query time. So using runtime fields isn't an option here.

ruflin commented 1 year ago

At the moment we haven't dropped docvalues and it we are not using synthetic source yet. In the scenario today, we can use runtime fields, but will it bring us any benefits?

Moving forward adopting TSDB and synthetic source, what should we do with field in the following two scenarios:

@martijnvg Can you answer separately for scenario a and b independent of each other? And dig into if today runtime fields brings us some benefits.

It is important to note, that with integrations we can ship such changes back to 7.x releases to also bring improvements there where synthetic source etc. did not exist yet.

joshdover commented 1 year ago

In the case of event.ingested, there should be way to disable (or enable) the final pipeline.

This could be an option, though today it would have to be for an entire integration package and every agent pushing data into those data streams. If we improve the granularity of data stream assets we could make this more granular, such as with https://github.com/elastic/kibana/issues/121118.

There may also be a way we could add a tag to the incoming events that the ingest pipeline reads to skip adding these fields. If we went this route, I'd prefer we have a single tag to control this that would both strip event.duration, skip adding event.ingested, and also skip adding event.original. This could be a generic event.debug tag or similar.

At the moment we haven't dropped docvalues and it we are not using synthetic source yet. In the scenario today, we can use runtime fields, but will it bring us any benefits?

Yeah we could drop the indexing from these fields to save some storage. It seems this would still be compatible with TSDB+synth source too since we'll need to keep the doc_values for these for synthetic source. I like this because it avoids changing this multiple times across releases and resulting in a disjointed experience when looking at data over time that was ingested across different versions.

Moving forward adopting TSDB and synthetic source, what should we do with field in the following two scenarios:

* a) Only has to be available to look at, no queries, no aggs

* b) Should be available as a runtime field

I'm not sure what you're asking here, Ruflin. I don't see how we'll be able to ever support a field in a TSDB+synth source index that meets criteria (a). But I'm also not sure why we'd need that. We have to to store the field somewhere and having the single copy in doc_values should be minimal enough, from a storage perspective, and it's actually more desirable from a runtime field perspective since accessing doc_values in runtime fields is faster than from _source.

andresrc commented 1 year ago

it's actually more desirable from a runtime field perspective since accessing doc_values in runtime fields is faster than from _source

This was a question that I had: can runtime field use doc-value only fields (i.e. index:false) when using synthetic source or is _source always needed for runtime fields?

nik9000 commented 1 year ago

Runtime fields should always use doc values if possible. _source is slow and should be avoided unless you have a really good excuse.

I really wanted this to be obvious in the docs. But I guess I never put enough effort into that. Could you point to why you thought you had to use _source? I'm sure I've made a mistake somewhere.

On Fri, Dec 23, 2022, 10:09 AM Andres Rodriguez @.***> wrote:

it's actually more desirable from a runtime field perspective since accessing doc_values in runtime fields is faster than from _source

This was a question that I had: can runtime field use doc-value only fields (i.e. index:false) when using synthetic source or is _source always needed for runtime fields?

— Reply to this email directly, view it on GitHub https://github.com/elastic/integrations/issues/4894#issuecomment-1364029271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUXITV2YYFPGD7LMBUSXDWOW6C5ANCNFSM6AAAAAATHPQFL4 . You are receiving this because you were mentioned.Message ID: @.***>

andresrc commented 1 year ago

Thanks @nik9000 , probably a misconception from my side, I see this sentence in the docs that actually clarifies it, I had previously missed it:

This script has access to the entire context of a document, including the original _source via params._source and any mapped fields plus their values

ruflin commented 1 year ago

I'm not sure what you're asking here, Ruflin. I don't see how we'll be able to ever support a field in a TSDB+synth source index that meets criteria (a). But I'm also not sure why we'd need that. We have to to store the field somewhere and having the single copy in doc_values should be minimal enough, from a storage perspective, and it's actually more desirable from a runtime field perspective since accessing doc_values in runtime fields is faster than from _source.

Should have been more specific on this. What I meant by this is that no queries, no aggs are not a requirement but if we get it for free, this is great.

This could be an option, though today it would have to be for an entire integration package and every agent pushing data into those data streams.

I was thinking even broader that it would be disabled for all data streams.

There may also be a way we could add a tag to the incoming events that the ingest pipeline reads to skip adding these fields. If we went this route, I'd prefer we have a single tag to control this that would both strip event.duration, skip adding event.ingested, and also skip adding event.original. This could be a generic event.debug tag or similar.

Where would this tag be added? On the edge or centrally?

ruflin commented 1 year ago

@nik9000 Where I think the conflicting part is, that for the data streams that have event.duration, we will eventually use TSDB and with it synthetic source. This means we have something similar to if we use a runtime field:

DELETE /index04
PUT /index04
{
  "mappings": {
    "runtime": {
        "event.duration":  { "type": "long" }
    },
    "_source" : {
      "mode" : "synthetic"
    }
  }
}

PUT /index04/_create/1
{
  "event.duration": 1
}

GET /index04/_search
{
  "query":{
    "match" : {
        "event.duration" : 1
    }
  }
}

But this returns an error as runtime fields don't work in combination with synthetic source. Now before we use TSDB / Synthetic source, it seems using a runtime field would create the minimal footprint for the event.duration field as it would only exist in _source and not in doc values or index. As soon as we use synthetic source, only having doc values seems to be ideal scenario:

DELETE /index05
PUT /index05
{
  "mappings": {
    "properties": {
        "event.duration":  { "type": "long", "index": false }
    },
    "_source" : {
      "mode" : "synthetic"
    }
  }
}

PUT /index05/_create/1
{
  "event.duration": 1
}

GET /index05/_search
{
  "query":{
    "match" : {
        "event.duration" : 1
    }
  }
}
joshdover commented 1 year ago

Where would this tag be added? On the edge or centrally?

IMO this should be tagged on the edge so a single agent could be debugged. The ingest pipeline would read this tag and make the necessary changes to the document before indexing.

ruflin commented 1 year ago

We are now in the middle of adoption of TSDB for metric data streams but no decision is reached on this topic. I suggest whatever decision we make, we focus our change on TSDB data streams.

@joshdover We had recently some discussions around making the final pipeline optional. As event.ingested is added in the final pipeline, this would already help.

@martijnvg For TSDB, is the ideal mapping for event.duration the following? The assumption is synthetic source is enabled too.

"properties": {
        "event.duration":  { "type": "long", "index": false }
    },
elasticmachine commented 1 year ago

Pinging @elastic/fleet (Team:Fleet)

martijnvg commented 1 year ago

@ruflin With this configuration we store event.duration field as doc values, but not as points. Without knowing the history here, do we need to include event.duration at all in a document? Or can we not store event.duration as doc values by also setting doc_values to false?

joshdover commented 1 year ago

We had recently some discussions around making the final pipeline optional. As event.ingested is added in the final pipeline, this would already help.

Yes I need to open an issue to discuss this, but maybe we can make some progress here. Would this be something we want integration devs to specify / opt-out of or do users need control over this? If it's only required for security integrations so we can make this a package-level setting.

ruflin commented 1 year ago

If it's only required for security integrations so we can make this a package-level setting.

I think we could go one step further and make it a Fleet wide setting. If you have a SIEM use case, you might want it on all of your logs, for observability likely not.

ruflin commented 1 year ago

@ruflin With this configuration we store event.duration field as doc values, but not as points. Without knowing the history here, do we need to include event.duration at all in a document? Or can we not store event.duration as doc values by also setting doc_values to false?

@martijnvg What you mean y "not as points"? And will the above work with TSDB / synthetic source? It means, event duration would be indexed, meaning filtering on it would still work just not querying? If we pick doc_values only, it would still be aggregatable and slow on filtering? What are the pros / cons of these two options?

martijnvg commented 1 year ago

@ruflin So I'm wondering whether this duration field is important at all? If not then we ingest it in the first place? Since the title of this issue is about removing the event.duration field.

What you mean y "not as points"?

It means that there is no data structure for efficiently querying this field.

And will the above work with TSDB / synthetic source? It means, event duration would be indexed, meaning filtering on it would still work just not querying? If we pick doc_values only, it would still be aggregatable and slow on filtering? What are the pros / cons of these two options?

The above configuration will work with TSDB and synthetic source. The duration field is indexed with only doc values enabled. This means the field is aggregatable and queryable. But querying can be slow, because there is no dedicate data structure just for querying.

If index is set to true then there is a dedicated data structure for querying the duration (referred to as points in Lucene). Which means querying this field will be faster, at the expense of having additional data structure that requires resources (disk, memory, cpu (during indexing / merging)).

If this field will be used rarely then I would suggest to set index=false, like you already did in your mapping sample.

ruflin commented 1 year ago

@ruflin So I'm wondering whether this duration field is important at all? If not then we ingest it in the first place? Since the title of this issue is about removing the event.duration field.

Removing the fields is also part of the discussion but if we have to keep it, I wanted to know the ideal "storage" solution that creates the minimal overhead.

@martijnvg What you are describing above is basically that event.duration becomes a metrics fields, doc values only. The part that still confuses me is your comment further up:

Or can we not store event.duration as doc values by also setting doc_values to false

If we have index: false, no source because of synthetic source and doc_values: false, where would the value then be stored? Is this even a possible combination?

martijnvg commented 1 year ago

Removing the fields is also part of the discussion but if we have to keep it, I wanted to know the ideal "storage" solution that creates the minimal overhead.

Ok, I understand now.

If we have index: false, no source because of synthetic source and doc_values: false, where would the value then be stored? Is this even a possible combination?

In that case the field is accepted during indexing but not stored at all. Meaning that it can no longer be retrieved or used. I think this is not something that you're looking for. In that case the configuration that you initially mentioned (index: false and doc_values: true (which is the default)) is the most ideal. With that configuration, the event.duration field still be synthesised in _source, queried (but slow) and aggregated.

ruflin commented 1 year ago

For event.ingested I filed the following issue: https://github.com/elastic/elasticsearch/issues/100324 This would mean we can remove this feature from the final pipeline and instead have it managed as part of the data streams.

ruflin commented 1 year ago

For event.duration, I suggest we move this to doc_values: true; index: false. My understanding is this would work with and without synthetic source. @eyalkoren Is this maybe something we should change in the dynamic ECS templates?

In parallel we should also tackle the problem on the shipper side (beats) to make event.duration opt in instead of the default.

felixbarny commented 1 year ago

IIUC, event.duration and event.ingested is important for Security and SIEM use cases. But doesn't that only concern logs? Can we just always avoid adding these fields for metrics?

sophiec20 commented 1 year ago

I cannot get a sense from the links how event.ingested takes up 16% .. how did that get calculated? feel free to dm me

event.ingested is used for real-time analysis in alerting, anomaly detection and transforms. This compensates for data arriving out-of-time order. It is critical for Transforms where data loss occurs if there are lags during ingest. This is seen frequently in the field, for example with clock drift or offline agents or processing delays. Without event.ingested the only workaround is a query delay, which can be significant, and causes delays in alerting etc.

From experience with customers, this is applicable to both metrics and logging.

Potentially, in order to save space, if the index is guaranteed to contain time-ordered data, then we can simply use an alias between @timestamp and event.ingested. The enhancement to make it more storage efficient is also compelling. https://github.com/elastic/elasticsearch/issues/100324

ruflin commented 1 year ago

event.ingested is used for real-time analysis in alerting, anomaly detection and transforms. This compensates for data arriving out-of-time order. It is critical for Transforms where data loss occurs if there are lags during ingest. This is seen frequently in the field, for example with clock drift or offline agents or processing delays. Without event.ingested the only workaround is a query delay, which can be significant, and causes delays in alerting etc.

This is for me a compelling reason to make it a setting as proposed in https://github.com/elastic/elasticsearch/issues/100324 because based on the above, this is a key field for functionality in our stack. If it is a setting, Elasticsearch then can tune how the field is persisted and accessed.

felixbarny commented 1 year ago

The arrival timestamp is a field that's inherently difficult to compress. There's definitely more that we can do like disabling indexing and storing it as a gauge in TSDB so that the more efficient doc_value encoding is enabled for that field. However especially, in the context of https://github.com/elastic/elasticsearch/issues/91775, we'll be at a distinctive disadvantage when storing both the event and the arrival timestamp for each data point when other metric stores only store the event timestamp, which is usually much easier to compress due to the more predictable intervals between data points. Therefore, I don't think relying on event.ingested is a viable long-term solution for metrics.

event.ingested is used for real-time analysis in alerting, anomaly detection and transforms. This compensates for data arriving out-of-time order. It is critical for Transforms where data loss occurs if there are lags during ingest.

Have we considered relying on the _seq_no for that purpose?

Having said that, we're also discussing to drop _id and _seq_no for time series data: https://github.com/elastic/elasticsearch/issues/48699. However, depending on how we implement that, the _seq_no would at least be available for some amount of time.

Potentially, in order to save space, if the index is guaranteed to contain time-ordered data, then we can simply use an alias between @timestamp and event.ingested.

When you say "time-ordered", do you mean ordered by arrival time or event time? In TSDB, metrics are ordered by the event timestamp. But data can come in late even in TSDB. Usually, data points from the same time series are coming in order but even for that, there's no guarantee.

sophiec20 commented 1 year ago

I am answering this with the lens of the current sets of features which "search directly after ingest".

We initially tried _seq_no for transforms but were unable to get this working with wide index patterns and cross cluster (I don't know the exact technical implementation issues, but at the time these were blockers).

For "time-ordered", I mean arrival time. In the case of transforms, we use event.ingested to identify what has recently arrived so it can be processed quickly, even if the final results might use an event_time date histogram.

From our experience, customers have late arriving data -- even with highly optimised ingest, there will be some event which seldom/sometimes/often occurs to delay data, even if temporarily.

Some use cases are fine to use event_time to identify recent data. For these use cases we tend to have a query delay (we process data between the time we last checked and now - query-delay) .. very late arriving data is ignored, results are delayed by at least query_delay, and this is ok for some use cases.

Some use cases require all newly arrived data to be processed as close to the time it arrived as possible. We use event.ingested to identify what has recently arrived. Without knowing this, then the client-side logic becomes complicated and the extra processing effort and memory required likely outweighs any benefits of reduced storage.

The downside of not having event.ingested means that real-time analysis might drop late arriving data and may be configured to run with a query_delay. These feel like the pragmatic options for the current features which "search after ingest". This compromise may be acceptable (with perhaps the option of enabling event.ingested as a workaround for the use cases which need it) -- and it may be further acceptable if we can see a path to alternative strategies such as improvements to seq_no or stream processing capabilities or something else.

Note. In case its not clear, I have a preference to keep event.ingested in the near-term for the "search directly after ingest" features. When we have support cases for missing data or delayed analysis, they tend to be higher priority. But if we choose otherwise, I hope the trade-offs, as I see them, are explained.

botelastic[bot] commented 2 weeks ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!