elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.69k stars 8.23k forks source link

[Automatic Import] Reject log files that really have no field data #199886

Open ilyannn opened 2 weeks ago

ilyannn commented 2 weeks ago

Context

We implemented a feature to parse logs in new formats, like unstructured logs. The main idea there is to take whatever the user throws at us, and look for stuff we can extract.

I've tested it by giving a .py file as input and Automatic Import still generated an integration 202411122346_format-1.0.0.zip

There's no miracle here - it's just taking the whole line and using it as field value. And then mapping it to process.command_line, presumably since it's the closest thing to "just some kind of unstructured text".

  - grok:
      tag: grok_header_pattern
      field: message
      patterns:
        - '%{GREEDYDATA:202411122346_format.python.message}'
  - rename:
      ignore_missing: true
      if: ctx.event?.original == null
      tag: rename_message
      field: originalMessage
      target_field: event.original
  - remove:
      ignore_missing: true
      if: ctx.event?.original != null
      tag: remove_copied_message
      field: originalMessage
  - remove:
      ignore_missing: true
      tag: remove_message
      field: message
  - rename:
      ignore_missing: true
      field: 202411122346_format.python.message
      target_field: process.command_line

Suggestion

Let the LLM know sometimes it's ok to admit when you can't find an order in this chaotic world!

If the input is basically a bunch of text that does not have anything similar to field data the LLM should have in the prompt that it should return 'unsupported', not just 'unstructured'. it's quite possible a user has just drag-and-dropped the wrong file.

elasticmachine commented 2 weeks ago

Pinging @elastic/security-scalability (Team:Security-Scalability)

bhapas commented 2 weeks ago

Sounds like a bug here. May be we should set boundaries for LLM when identifying different formats and let it choose unsupported by default if nothing matches. And also the text for identifying different log formats may be refined to get better results.