Open klacabane opened 1 year ago
This sounds great. Once we have the "raw" indices in place and a first pass at a consolidation transform system, it would be great to follow up on this and see if we can point these implicit/signal collectors at the raw indices alongside assetbeat/explicit collection, and see what happens.
Summary
Following @MichaelKatsoulis investigation to consolidate assets generated by multiple sources, the current approach to collect hosts by
host.hostname
in implicit collection could use improvements to align fields selection with assetbeat and output more accurate results. Discussed this point offline and ended up having the following queries (raw implementation in https://github.com/elastic/kibana/pull/166181):kubernetes.node.uid
->asset { asset.id: <kubernetes.node.uid>, asset.ean: host:<kubernetes.node.uid> }
cloud.instance.id
->asset { asset.id: <cloud.instance.id>, asset.ean: host:<cloud.instance.id> }
host.id
->asset { asset.id: <host.id>, asset.ean: host:<host.id> }
host.hostname
->asset { asset.id: <host.hostname>, asset.ean: host:<host.hostname> }
This will cause duplication as one host could output 4 asset documents if all fields exist, and this can be mitigated by 1. clients narrowing the search by host subtype (eg only get aws/k8s hosts) and 2. api layer discarding dups given overlapping fields (eg two assets sharing same
host.id
). As pointed out during Obs Asset call a subset of duplicated assets can be seen as parent-child relationship (k8s node running on cloud instance). Additionally duplication can be limited at collection time, for example thehost.hostname
query can be further filtered by ignoring documents that also carry fields of the specialized queries (kubernetes.node.uid
orcloud.instance.id
).