Open nreese opened 1 week ago
Oh my gosh, thank you so much @nreese!
Resolves https://github.com/elastic/siem-team/issues/323 🎉
++ Thanks @nreese
And resolves https://github.com/elastic/security-team/issues/3067
😂
@spong and @stephmilovic
Thanks for all of the linked issues. This seems to be a popular request.
I have 2 questions
yarn start --run-examples
and would not be available on cloud or any kibana distribution. Is that ok? And would this PR still close these linked issues?
- Does the data set in this PR have all fields for SIEM use cases? Is a more robust data set needed?
This is a great start, but we can definitely expand on it. For example, @patrykkopycinski has a PR open with faux malware events to use with the attack discovery feature: https://github.com/elastic/kibana/pull/182918
- This PR adds sample data via an example plugin - meaning this targets developers who run kibana locally with the command
yarn start --run-examples
and would not be available on cloud or any kibana distribution. Is that ok? And would this PR still close these linked issues?
Ah I didn't realize this was targeting developers only. As you've built this with the example plugins, I think adding ci:build-example-plugins
to a cloud deployed PR would make the data available for cloud testing. My issue was developer centric so I would consider it closed by this.
But yeah from the perspective of new users as covered in Garrett's issue and Kseniia's issue we'd love to put this in front of new customers with the rest of the sample data. What would it take to make it available there?
But yeah from the perspective of new users as covered in https://github.com/elastic/siem-team/issues/323 and https://github.com/elastic/security-team/issues/3067 we'd love to put this in front of new customers with the rest of the sample data. What would it take to make it available there?
Very low technical effort. The implementation would just be moved out of examples and into either home
plugin or any other plugin.
Might be more push back on this solution since one of down sides of sample data is that it bloats the kibana distribution size. Also, there has been push back with solution sample data in production since Product wants users to go through the process of setting up real data work flows. Easily accessible sample data does not push users towards setting up work flows.
Also, there has been push back with solution sample data in production since Product wants users to go through the process of setting up real data work flows. Easily accessible sample data does not push users towards setting up work flows.
@MikePaquette I saw you weigh in on Garrett's ticket. How do you feel about packaging security sample data with Kibana? cc @jamesspi @paulewing
Given the reasons outlined in the last attempt that didn't make it, it's probably best to keep this developer centric. I didn't know about ci:build-example-plugins
@stephmilovic, so thanks for that tidbit!
Knowing that, we can have supplemental sample data both locally and in cloud PR's, so that's a big win in my book. Perhaps the user-facing initiative has stalled (that comment was from awhile ago), but seems best to let security product drive that one.
Thanks @nreese
Given the reasons https://github.com/elastic/kibana/pull/164052#issuecomment-1735302999 that didn't make it, it's probably best to keep this developer centric.
Agreed, let's keep this for developer examples only. Even in that limited developer example capacity, we need to ensure that the included sample data does not contain any personal or confidential information.
Perhaps the user-facing initiative has stalled (that comment was from awhile ago), but seems best to let security product drive that one.
Yes, we've decided to invest in steering users to demo systems rather than investing in making a robust and safe process for using sample data on a system that might become a production customer system.
@nreese, is this the data that I prepared on the synth-sec
cluster? I would not dare to call it "Security solutions logs data", it's barely a IPs and geo location, there is actually very little to make it useful outside the map view.
is this the data that I prepared on the synth-sec cluster? I would not dare to call it "Security solutions logs data", it's barely a IPs and geo location, there is actually very little to make it useful outside the map view.
It is the data pulled from the cluster you provided.
Is there a more complete data set I could use?
Is there a more complete data set I could use?
No. We could improved it but only to a certain degree.
I suggest to change the description otherwise users will get disappointed by the limited use they can do of that data.
I suggest to change the description otherwise users will get disappointed by the limited use they can do of that data.
thanks. This is a good place to start and we can always iterate on the data set.
@cavokz would you mind answering https://github.com/elastic/kibana/pull/182979#pullrequestreview-2050357438 since you have more knowledge on where the data is coming from?
Please describe where/how the proposed sample data was obtained and verify that the sample data:
* [x] contains no personally identifiable information of any person * [x] contains no confidential information of Elastic or any other person, organization, or company * [ ] is not subject to any license or copyright * [x] is not otherwise restricted for this use case
The IP addresses are totally random, they come from this Geneve formula ipaddress.ip_address(random.randrange(1, 2**bits))
where bits can be either 32 or 64. ipaddress
is a Python stdlib module.
The geo info come from the Faker geo provider which in turn takes the data from geonames.org where it's licensed under Creative Commons Attribution 3.0 License.