Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.37k stars 1.06k forks source link

Extract field from json_extracted field not working #4229

Open piegus opened 7 years ago

piegus commented 7 years ago

I have 2 extractors.

  1. extract json

    
    
    {
      "title": "json_extract",
      "extractor_type": "json",
      "converters": [],
      "order": 0,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "flatten": false,
        "list_separator": ", ",
        "kv_separator": "=",
        "key_prefix": "xxx.",
        "key_separator": ".",
        "replace_key_whitespace": false,
        "key_whitespace_replacement": "_"
      },
      "condition_type": "none",
      "condition_value": ""
    },
send (lower) in order:

{ "title": "split xxx_datetime_date as store as timestamp_xxx", "extractor_type": "split_and_index", "converters": [ { "type": "flexdate", "config": { "time_zone": "Poland" } } ], "order": 2, "cursor_strategy": "copy", "source_field": "xxx_datetime_date", "target_field": "timestamp_xxx", "extractor_config": { "index": 1, "split_by": "." }, "condition_type": "string", "condition_value": "." }


part of the json extract:
{...
"datetime":{"date":"2017-10-10 09:39:01.607127","timezone_type":3,"timezone":"Europe/Warsaw"}
...}

The fields are extracted to:
xxx_datetime_date: 2017-10-10 09:39:01.607127
xxx_timezone_type: 3
xxx_timezone: Europe/Warsaw

## Expected Behavior
Json Extractor works fine. But When I try to extract again field xxx_datetime_date to timestamp_xxx
or just copy the field to anoter filed. It's not working.

## Current Behavior
Second extractor on extracted json field (xxx_datetime_date) does not work.
No additional field are added to message.

Pipelines also not working.

rule "Appserver Parsing - Timestamp" when has_field("xxx_datetime_date") then let new_timestamp = parse_date(to_string($message.xxx_datetime_date), "yyyy-MM-dd HH:mm:ss.SSS"); set_field("xxx_pipline_timestamp", new_timestamp); // If the timestamp is correct, rename the field end



## Context
I cannot set timestamp from messages taht come as full json.

* Graylog Version: 2.3.0
* Elasticsearch Version: /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.4.4.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.default.path.home=/usr/share/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/var/lib/elasticsearch -Des.default.path.conf=/etc/elasticsearch
* MongoDB Version: v2.4.10
* Operating System: Ubuntu 16.10
* Browser version: Firefox 57 b5
joschi commented 7 years ago

@piegus What's the order of message processors in your Graylog setup?

You can find this information on the System / Configurations page in the "Message Processors Configuration" section.

piegus commented 7 years ago
# Processor Status
1 GeoIP Resolver active
2 Pipeline Processor active
3 Message Filter Chain active
piegus commented 7 years ago

Extracting files from grok patters works fine. But It seems that extracting fields from json is not working properly.

joschi commented 7 years ago

@piegus The "Pipeline Processor" can only access fields which have been created before.

So if you're using the JSON extractor (which runs as part of the "Message Filter Chain"), you cannot use any fields extracted by it in any previous stage (such as the "Pipeline Processor").

piegus commented 7 years ago

So I need to set message filter chain first?

I changed it to:

# Processor Status
1 Message Filter Chain active
2 Pipeline Processor active
3 GeoIP Resolver active
piegus commented 7 years ago

But If that is the case why the normal extractors not working?

Pipeline processor is alternative for me.

joschi commented 7 years ago

@piegus What would you expect the Split & Index extractor to return? Are you sure you don't want to use a Copy Input extractor instead?

piegus commented 7 years ago

As You can see I also tried to copy the input. From xxx_datetime_date to copy_timestamp. But its also not working. The field is not appearing.

{
  "extractors": [
    {
      "title": "access.log",
      "extractor_type": "grok",
      "converters": [],
      "order": 4,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "%{IPORHOST:http_host} - (?:%{WORD:auth}|-) \\[%{HTTPDATE:timestamp_string}\\] %{WORD:verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion} \"%{NUMBER:response}\" (?:%{NUMBER:bytes}|-) \"(?:%{URI:referrer}|-)\" %{QS:agent} \"(?<ips>%{IP}(, %{IP})*|-)\""
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "json_extract",
      "extractor_type": "json",
      "converters": [],
      "order": 0,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "flatten": false,
        "list_separator": ", ",
        "kv_separator": "=",
        "key_prefix": "xxx.",
        "key_separator": ".",
        "replace_key_whitespace": false,
        "key_whitespace_replacement": "_"
      },
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "extract php_error_log",
      "extractor_type": "grok",
      "converters": [],
      "order": 3,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "",
      "extractor_config": {
        "grok_pattern": "\\[%{PHPLOGTIMESTAMP:timestamp_xxx}(?:\\s+%{PHPTZ:timezone}|)\\] %{GREEDYDATA:xxx_message}"
      },
      "condition_type": "string",
      "condition_value": "on line"
    },
    {
      "title": "xxx_timestamp to timesamp",
      "extractor_type": "copy_input",
      "converters": [
        {
          "type": "flexdate",
          "config": {
            "time_zone": "Poland"
          }
        }
      ],
      "order": 5,
      "cursor_strategy": "copy",
      "source_field": "timestamp_xxx",
      "target_field": "timestamp",
      "extractor_config": {},
      "condition_type": "none",
      "condition_value": ""
    },
    {
      "title": "split xxx_datetime_date as store as timestamp_xxx",
      "extractor_type": "split_and_index",
      "converters": [
        {
          "type": "flexdate",
          "config": {
            "time_zone": "Poland"
          }
        }
      ],
      "order": 2,
      "cursor_strategy": "copy",
      "source_field": "xxx_datetime_date",
      "target_field": "timestamp",
      "extractor_config": {
        "index": 1,
        "split_by": "."
      },
      "condition_type": "string",
      "condition_value": "."
    },
    {
      "title": "copy_timestamp",
      "extractor_type": "copy_input",
      "converters": [],
      "order": 0,
      "cursor_strategy": "copy",
      "source_field": "xxx_datetime_date",
      "target_field": "copy_timestamp",
      "extractor_config": {},
      "condition_type": "none",
      "condition_value": ""
    }
  ],
  "version": "2.3.0"
}
joschi commented 7 years ago

@piegus Please post some example messages which should match the extractor conditions so we can try to reproduce the issue.

piegus commented 7 years ago
{
   "context" : [],
   "channel" : "app",
   "level_name" : "CRITICAL",
   "extra" : {
      "uid" : "cdc56db69438efdcd1904db5a35e6aac"
   },
   "message" : "Create waybill error: INPOST ERROR: Nieprawidłowy punkt odbiorczy: KRA302",
   "level" : 500,
   "datetime" : {
      "date" : "2017-10-10 10:58:25.168583",
      "timezone" : "Europe/Berlin",
      "timezone_type" : 3
   }
}
joschi commented 7 years ago

@piegus And after the extractors have been running?

piegus commented 7 years ago

zrzut ekranu z 2017-10-10 11-22-46

piegus commented 7 years ago

What You mean after?

piegus commented 7 years ago

I will add that Im importing the messages by graylog-collector-sidecar

piegus commented 7 years ago

Do you need any more information?

joschi commented 7 years ago

@piegus No, I guess the information given so far will suffice.

We'll triage the issue and schedule a bug fix in our next bug triage.

piegus commented 6 years ago

I want to add that I manage to extract datetime from this by adding another regex extractor:

  {
      "title": "extract timestamp from json",
      "extractor_type": "regex",
      "converters": [
        {
          "type": "flexdate",
          "config": {
            "time_zone": "Poland"
          }
        }
      ],
      "order": 4,
      "cursor_strategy": "copy",
      "source_field": "message",
      "target_field": "timestamp",
      "extractor_config": {
        "regex_value": "\"date\":\"([^\"]*)\""
      },
      "condition_type": "string",
      "condition_value": "{\"message\""
    },
MysticRyuujin commented 4 years ago

I just came across this same issue on my setup, I have a JSON extractor, which is working just fine, followed by an extractor on one of the extracted fields...but it doesn't extract anything...

EDIT: I think it's because the field is already a numeric...