elastic / integrations

Elastic Integrations
186 stars 390 forks source link

[Netflow] Support TSDS #7549

Open BenB196 opened 10 months ago

BenB196 commented 10 months ago

Hi All,

I was curious if there are any plans to support enabling TSDS on the Netflow integration.

While this integration currently falls under the logs type. I think there would be significant value in allowing this integration to leverage TSDS.

Netflow contains a large number of metrics and generally at scale, will generate a significant number of timeseries events that need to be indexed and stored. I think that the Netflow integration would receive a significant value increase by leveraging TSDS in indexing speed, storage usage, and search/agg speed.

elasticmachine commented 10 months ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

andrewkroh commented 10 months ago

I was having the same idea, but for aws.vpcflow data which is very similar just with a smaller number of possible fields. I think we should do a test with using TSDS on one of these flow log data sources. I think storage size would be the biggest benefit.

One thing that could cause an issue (particularly in the aws.vpcflow case) is late arriving data (like if it is historical data read from S3). TSDS can only accept data that has a "recent" timestamp (see https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html#tsds-accepted-time-range).

BenB196 commented 10 months ago

Interesting, I didn't really think about historical ("backfill") data here, but it does make sense as to consider.

I wonder if https://github.com/elastic/elasticsearch/issues/98463 would help as well in this scenario. I don't really know the dynamics around use cases like importing VPC flow data, so not sure how "useable" this feature would be in a "backfill" scenario.

jamiehynds commented 4 months ago

Related presentation from ElatiFlow: https://www.youtube.com/watch?v=MuMXNTFsKto&t=1238s&ab_channel=OfficialElasticCommunity

elasticmachine commented 4 months ago

Pinging @elastic/sec-deployment-and-devices (Team:Security-Deployment and Devices)

pkoutsovasilis commented 1 month ago

hello 👋 full disclosure I just started reading about TSDS; so here are some quick ones to pick your brains @andrewkroh @BenB196

quoting from here

Only use a TSDS if you typically add metrics data to Elasticsearch in near real-time and @timestamp order. A TSDS is only intended for metrics data. For other timestamped data, such as logs or traces, use a regular data stream.

So, if my interpretation of the above is correct, to take advantage of TSDS we need to define which fields are the metrics. At the moment there is no separation in the fields extracted from netflow input as which ones are eligible as metrics, e.g. I assume this one tcp_ack_total_count is a metric while this one ssl_server_name isn't?! Maybe a coarse-grain criteria can be the type of the field by having as metrics the ones that have a numeric type? (more info here)

quoting from here

In addition to a @timestamp, each document in a TSDS must contain one or more dimension fields. The matching index template for a TSDS must contain mappings for at least one keyword dimension.

Again, if my interpretation of the above is correct, we need to have at least one field as dimension; from having a look at the fields these can be exporter.address, exporter.source_id and exporter.version?! But what happens if these are missing?!

What are your thoughts on the above guys? 🙂

BenB196 commented 1 month ago

I think one of the main challenges with TSDS going back to something @andrewkroh pointed out and that is the possibility of needing to backfill data, which doesn't have the greatest experience with TSDS.

Elastic is working on a new "LogsDB" index mode, https://github.com/elastic/elasticsearch/issues/106462, which hopefully will provide many of the same benefits of TSDS, without the challenges of backfilling data. I don't think it will be 100% as efficient as TSDS could possibly be, but would still be a nice value add.

pkoutsovasilis commented 1 month ago

I think one of the main challenges with TSDS going back to something @andrewkroh pointed out and that is the possibility of needing to backfill data, which doesn't have the greatest experience with TSDS.

Elastic is working on a new "LogsDB" index mode, elastic/elasticsearch#106462, which hopefully will provide many of the same benefits of TSDS, without the challenges of backfilling data. I don't think it will be 100% as efficient as TSDS could possibly be, but would still be a nice value add.

yes! LogsDB index mode sounds like the best of both worlds with an acceptable trade-off. This comes as Tech Preview in ES 8.14.0 but I think the target is reach GA in 8.15.0