elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.35k stars 7.98k forks source link

Security solution sample data #182979

Open nreese opened 1 week ago

nreese commented 1 week ago
Screenshot 2024-05-08 at 11 29 47 AM
spong commented 1 week ago

Oh my gosh, thank you so much @nreese!

Resolves https://github.com/elastic/siem-team/issues/323 🎉

stephmilovic commented 1 week ago

++ Thanks @nreese

Resolves https://github.com/elastic/kibana/issues/124463

stephmilovic commented 1 week ago

And resolves https://github.com/elastic/security-team/issues/3067

😂

nreese commented 1 week ago

@spong and @stephmilovic

Thanks for all of the linked issues. This seems to be a popular request.

I have 2 questions

  1. Does the data set in this PR have all fields for SIEM use cases? Is a more robust data set needed?
  2. This PR adds sample data via an example plugin - meaning this targets developers who run kibana locally with the command yarn start --run-examples and would not be available on cloud or any kibana distribution. Is that ok? And would this PR still close these linked issues?
stephmilovic commented 1 week ago
  1. Does the data set in this PR have all fields for SIEM use cases? Is a more robust data set needed?

This is a great start, but we can definitely expand on it. For example, @patrykkopycinski has a PR open with faux malware events to use with the attack discovery feature: https://github.com/elastic/kibana/pull/182918

  1. This PR adds sample data via an example plugin - meaning this targets developers who run kibana locally with the command yarn start --run-examples and would not be available on cloud or any kibana distribution. Is that ok? And would this PR still close these linked issues?

Ah I didn't realize this was targeting developers only. As you've built this with the example plugins, I think adding ci:build-example-plugins to a cloud deployed PR would make the data available for cloud testing. My issue was developer centric so I would consider it closed by this.

But yeah from the perspective of new users as covered in Garrett's issue and Kseniia's issue we'd love to put this in front of new customers with the rest of the sample data. What would it take to make it available there?

nreese commented 1 week ago

But yeah from the perspective of new users as covered in https://github.com/elastic/siem-team/issues/323 and https://github.com/elastic/security-team/issues/3067 we'd love to put this in front of new customers with the rest of the sample data. What would it take to make it available there?

Very low technical effort. The implementation would just be moved out of examples and into either home plugin or any other plugin.

Might be more push back on this solution since one of down sides of sample data is that it bloats the kibana distribution size. Also, there has been push back with solution sample data in production since Product wants users to go through the process of setting up real data work flows. Easily accessible sample data does not push users towards setting up work flows.

stephmilovic commented 1 week ago

Also, there has been push back with solution sample data in production since Product wants users to go through the process of setting up real data work flows. Easily accessible sample data does not push users towards setting up work flows.

@MikePaquette I saw you weigh in on Garrett's ticket. How do you feel about packaging security sample data with Kibana? cc @jamesspi @paulewing

spong commented 1 week ago

Given the reasons outlined in the last attempt that didn't make it, it's probably best to keep this developer centric. I didn't know about ci:build-example-plugins @stephmilovic, so thanks for that tidbit!

Knowing that, we can have supplemental sample data both locally and in cloud PR's, so that's a big win in my book. Perhaps the user-facing initiative has stalled (that comment was from awhile ago), but seems best to let security product drive that one.

MikePaquette commented 1 week ago

Thanks @nreese

Given the reasons https://github.com/elastic/kibana/pull/164052#issuecomment-1735302999 that didn't make it, it's probably best to keep this developer centric.

Agreed, let's keep this for developer examples only. Even in that limited developer example capacity, we need to ensure that the included sample data does not contain any personal or confidential information.

Perhaps the user-facing initiative has stalled (that comment was from awhile ago), but seems best to let security product drive that one.

Yes, we've decided to invest in steering users to demo systems rather than investing in making a robust and safe process for using sample data on a system that might become a production customer system.

cavokz commented 6 days ago

@nreese, is this the data that I prepared on the synth-sec cluster? I would not dare to call it "Security solutions logs data", it's barely a IPs and geo location, there is actually very little to make it useful outside the map view.

nreese commented 6 days ago

is this the data that I prepared on the synth-sec cluster? I would not dare to call it "Security solutions logs data", it's barely a IPs and geo location, there is actually very little to make it useful outside the map view.

It is the data pulled from the cluster you provided.

Is there a more complete data set I could use?

cavokz commented 6 days ago

Is there a more complete data set I could use?

No. We could improved it but only to a certain degree.

I suggest to change the description otherwise users will get disappointed by the limited use they can do of that data.

nreese commented 6 days ago

I suggest to change the description otherwise users will get disappointed by the limited use they can do of that data.

thanks. This is a good place to start and we can always iterate on the data set.

@cavokz would you mind answering https://github.com/elastic/kibana/pull/182979#pullrequestreview-2050357438 since you have more knowledge on where the data is coming from?

cavokz commented 6 days ago

Please describe where/how the proposed sample data was obtained and verify that the sample data:

* [x]  contains no personally identifiable information of any person

* [x]  contains no confidential information of Elastic or any other person, organization, or company

* [ ]  is not subject to any license or copyright

* [x]  is not otherwise restricted for this use case

The IP addresses are totally random, they come from this Geneve formula ipaddress.ip_address(random.randrange(1, 2**bits)) where bits can be either 32 or 64. ipaddress is a Python stdlib module.

The geo info come from the Faker geo provider which in turn takes the data from geonames.org where it's licensed under Creative Commons Attribution 3.0 License.