Grok Debugger returns a false positive for the JAVACLASS pattern

ncamn commented 3 years ago

Kibana version: 7.11.2

Elasticsearch version: 7.11.2

Server OS version: Ubuntu Bionic Beaver (18.04 LTS) (GNU/Linux 4.15.0-136-generic x86_64)

Browser version: Chromium Version 90.0.4430.93 (Official Build) Arch Linux (64-bit)

Browser OS version: Archlinux (GNU/Linux 5.11.16-arch1-1)

Original install method (e.g. download page, yum, from source, etc.): APT

Describe the bug:

In Kibana's Grok Debugger, when matching the following sample

Chronometer "DailySummaryEmailLifecycle" ended. 84240 items, 16.28248015572731 items/s, 5173659 ms (1 hour, 26 minutes, 13 seconds and 659 milliseconds)

with the following pattern

Chronometer "%{JAVACLASS:[fields][chronometer][indexer]}" ended. %{INT:[fields][chronometer][count]} items, %{NUMBER:[fields][chronometer][speed]} items/s, %{NUMBER:[event][duration]} ms %{GREEDYDATA}

Kibana returns the following match

{
  "[fields][chronometer][speed]": "16.28248015572731",
  "[fields][chronometer][count]": "84240",
  "[fields][chronometer][indexer]": "DailySummaryEmailLifecycle",
  "[event][duration]": "5173659"
}

However, when adding this Grok pattern in a Logtash pipeline filter, the Grok parsing fail (the parsed field are missing in the final document, the _grokparsefailure tag is set).

After using another Grok debugging tool, grokdebug.herokuapp.com, I found out that replacing JAVACLASS in my pattern with WORD fixed the parsing issue in Logstash.

In conclusion, Kibana's Grok Debugger seems to return a false positive for the JAVACLASS pattern, compared to Logstash's Grok filter plugin, or other Grok debugging tools as grokdebug.herokuapp.com.

Steps to reproduce:

Go in Management -> Dev Tools -> Grok Debugger

Fill the Sample Data input with

Chronometer "DailySummaryEmailLifecycle" ended. 84240 items, 16.28248015572731 items/s, 5173659 ms (1 hour, 26 minutes, 13 seconds and 659 milliseconds)

Fill the Grok Pattern input with

Chronometer "%{JAVACLASS:[fields][chronometer][indexer]}" ended. %{INT:[fields][chronometer][count]} items, %{NUMBER:[fields][chronometer][speed]} items/s, %{NUMBER:[event][duration]} ms %{GREEDYDATA}

Run Simulate

Expected behavior:

A "Provided Grok patterns do not match data in the input" notification and {} in the Structured Data section.

Screenshots (if relevant):

Errors in browser console (if relevant):

The browser console raises no logs/errors.

Provide logs and/or server output (if relevant):

Not relevant.

Any additional context:

The Logstash filter's config file with the fixed Grok pattern:

filter {
  if [kubernetes][container][name] in ["api-external-webapp", "batches-webapp", "legacy-webapp"] {
    mutate {
      copy => { "message" => "log.original" } # Keep the original log value before processing it
      gsub => [ "message", "\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]", "" ] # Strip Linux color codes from the log
    }

    grok {
      match => { "message" =>
        [
          "Chronometer \"%{WORD:[fields][chronometer][indexer]}\" ended. %{INT:[fields][chronometer][count]} items, %{NUMBER:[fields][chronometer][speed]} items/s, %{NUMBER:[event][duration]} ms %{GREEDYDATA}"
        ]
      }
    }

    geoip { source => "ip" }

    if [log][level] == "DEBUG" {
      drop { }
    }

    if [log][logger] == "ConfigurationProvider" {
      drop { }
    }

    if [message] =~ /^#/ {
      drop { }
    }
  }
}

This filter is correctly loaded by Logstash as it appears in the Logstash "Starting pipeline" log (in "pipeline.sources").

elasticmachine commented 3 years ago

Pinging @elastic/es-ui (Team:Elasticsearch UI)

elasticmachine commented 2 months ago

Pinging @elastic/kibana-management (Team:Kibana Management)

elastic / kibana

Grok Debugger returns a false positive for the JAVACLASS pattern #99311