Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.37k stars 1.06k forks source link

JSON Extractor doesn't cut from field #1564

Open pgosselin82 opened 8 years ago

pgosselin82 commented 8 years ago

The JSON extractor doesn't cut from field even if the cut option is selected.

{
  "condition_type": "regex",
  "condition_value": "(\\{.*\\})",
  "converters": [],
  "cursor_strategy": "cut",
  "extractor_config": {
    "flatten": false,
    "key_separator": ".",
    "list_separator": ", ",
    "kv_separator": "="
  },
  "extractor_type": "json",
  "order": 1,
  "source_field": "jsonData",
  "target_field": "jsonData",
  "title": "json-to-fields"
}
haizaar commented 6 years ago

Hi guys,

I still have it with Graylog 2.4.5.

Here is the extractor config:

    {
      "title": "SD Resource from JSON",
      "extractor_type": "json",
      "converters": [],
      "order": 0,
      "cursor_strategy": "cut",
      "source_field": "resource",
      "target_field": "",
      "extractor_config": {
        "list_separator": ", ",
        "kv_separator": "=",
        "key_prefix": "resource_",
        "key_separator": "_",
        "replace_key_whitespace": false,
        "key_whitespace_replacement": "_"
      },
      "condition_type": "none",
      "condition_value": ""
    }

My understanding that since JSON extractor processes the whole field, the "cut" strategy would mean that original field will be empty and skipped from indexing by ES, which is a desired behavior in my situation - I receive JSON docs from Google StackDriver and unfolding the inner JSON which is stored as text.

hkelley commented 5 years ago

Still present in 2.5.1+34194da, codename Trippy Trampoline

kmerz commented 5 years ago

I just took a look into the bug, here my thoughts:

Since we can't accurately define which parts of the field can be removed per json key/value in: https://github.com/Graylog2/graylog2-server/blob/master/graylog2-server/src/main/java/org/graylog2/inputs/extractors/JsonExtractor.java#L100 the beginIndex is right now set to -1.

The beginIndex is later checked in: https://github.com/Graylog2/graylog2-server/blob/master/graylog2-server/src/main/java/org/graylog2/plugin/inputs/Extractor.java#L227 to prevent removing where there is nothing to be removed.

What we need here is to tell the code in Extractor.java to remove the entire field, since a JSON Extractor will work on the complete field and not only on a substring of the field.

hkelley commented 5 years ago

I was able to work around this by using the JSON extractor to expand and then using a pipeline rule to clean up the original field.

I would have switched all processing to the pipeline but it looks like the ability to extract all JSON fields and add them all as message fields is still in progress:

[https://github.com/Graylog2/graylog-plugin-pipeline-processor/pull/228]() [https://community.graylog.org/t/parse-unknown-json-with-pipelines/3293/7]()