Open ilyannn opened 2 weeks ago
Pinging @elastic/security-scalability (Team:Security-Scalability)
Sounds like a bug here. May be we should set boundaries for LLM when identifying different formats and let it choose unsupported
by default if nothing matches. And also the text for identifying different log formats may be refined to get better results.
Context
We implemented a feature to parse logs in new formats, like unstructured logs. The main idea there is to take whatever the user throws at us, and look for stuff we can extract.
I've tested it by giving a
.py
file as input and Automatic Import still generated an integration 202411122346_format-1.0.0.zipThere's no miracle here - it's just taking the whole line and using it as field value. And then mapping it to
process.command_line
, presumably since it's the closest thing to "just some kind of unstructured text".Suggestion
Let the LLM know sometimes it's ok to admit when you can't find an order in this chaotic world!
If the input is basically a bunch of text that does not have anything similar to field data the LLM should have in the prompt that it should return 'unsupported', not just 'unstructured'. it's quite possible a user has just drag-and-dropped the wrong file.