Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.41k stars 1.07k forks source link

processing error after migrating to 4.2.0-3 #11495

Closed ngvirus closed 2 years ago

ngvirus commented 3 years ago

Graylog-server version: 4.2.0-3 OS: Oracle Linux 8 Java: 1.8.0-openjdk-headless-1.8.0.302.b08-0.el8_4.x86_64 I just upgraded to this version of Graylog and in my processed events i see this error: gl2_processing_error: Value <2021-10-20 09:05:39,892> caused exception: Invalid format: "2021-10-20 09:05:39,892" is malformed at ",892".

I have a very basic GROK extractor that works fine:

\[%{DATA:timestamp}\]\s\[%{DATA:loglevel}\]\s\[%{DATA:fieldname2}\]\s\[%{DATA:fieldname3}\]\s\[%{DATA:fieldname4}\]\s-\s(?<message>(.|\r|\n)*)$ In the server log i see this warning which is like 850 lines long repeating the same 6 lines over and over: 2021-10-20T13:13:28.286Z WARN [ProcessBuffer] Unable to process event MessageEvent{raw=null, message=null, messages=null}, sequence 896092 java.lang.StackOverflowError: null at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731) ~[?:1.8.0_302] at java.util.regex.Pattern$BranchConn.match(Pattern.java:4582) ~[?:1.8.0_302] at java.util.regex.Pattern$CharProperty.match(Pattern.java:3791) ~[?:1.8.0_302] at java.util.regex.Pattern$Branch.match(Pattern.java:4618) ~[?:1.8.0_302] at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672) ~[?:1.8.0_302] at java.util.regex.Pattern$Loop.match(Pattern.java:4799) ~[?:1.8.0_302]

This is how a message looks like.

[2021-10-20 08:56:06,420] [INFO] [field1] [field2] [field3] - rest of the message

supahgreg commented 3 years ago

Any difference if you change (?<message>(.|\r|\n)*)$ to (?s)(?<message>.*)$ (Pattern.DOTALL)? I'm guessing the stack overflow is happening due to recursion associated w/ that inner capturing group.

ngvirus commented 3 years ago

Any difference if you change (?<message>(.|\r|\n)*)$ to (?s)(?<message>.*)$ (Pattern.DOTALL)? I'm guessing the stack overflow is happening due to recursion associated w/ that inner capturing group.

Hello, This helped for the exception, but still the new error persists: gl2_processing_error: Value <2021-10-20 09:05:39,892> caused exception: Invalid format: "2021-10-20 09:05:39,892" is malformed at ",892".

supahgreg commented 3 years ago

Good to hear. The invalid format error is due to you trying to set the timestamp field to something not matching yyyy-MM-dd HH:mm:ss.SSS (i.e. comma instead of period before milliseconds; also assumed to be UTC, just in case yours isn't).

ngvirus commented 3 years ago

Good to hear. The invalid format error is due to you trying to set the timestamp field to something not matching yyyy-MM-dd HH:mm:ss.SSS (i.e. comma instead of period before milliseconds; also assumed to be UTC, just in case yours isn't).

okay, but before 4.2 this was working fine. is this a change from 4.2?

kokosda commented 3 years ago

I confirm this behaviour in the Regular Expression extractor after upgrade from 4.1 to 4.2: image Prior to 4.2, it worked fine. And, yes, we need to have a comma after seconds, not a period.

kokosda commented 3 years ago

I confirm this behaviour in the Regular Expression extractor after upgrade from 4.1 to 4.2:

I approached this with 2 extractors: 1) the 1st one (of type Replace with Regular Expression) replaces a comma in the timestamp with a period 2) the 2nd one (of type Regular Expression) performs date conversion and stores the result in the timestamp

The 1st one: image

The 2nd one: image

kerem-ak1 commented 3 years ago

is there any update for this ?

jenlehma commented 2 years ago

Hello all,

I stumpled across this issue after performing an upgrade to version 4.2.1 and suddenly my extrator with a custom date_format stopped working. Now I took the time to build a small example to reproduce the issue I was facing, so I can easily test it on different Graylog versions. I tested this on version 3.1.2 (works correctly), 4.1.7 (works correctly) and 4.2.1 (which fails).

Basis details: In our extractor we had to convert a custom date-format into a internal representation to get the correct value into the timestamp field. The part of this timestamp looks like this ...

2021-11-15T10:00:21.234 +0100

As you can see, there is a "T" between date- and time-parts and additionally the timezone should be taken into account as well. In our case the milli-seconds are separated by a "." .. (which is different from the OPs example, but we still use a custom format to parse the value)

After an update to version 4.2.1 the extractor writes the following into gl2_processing_error field: Value <2021-11-15T10:00:21.234 +0100> caused exception: Invalid format: "2021-11-15T10:00:21.234 +0100" is malformed at "T10:00:21.234 +0100".

How to reproduce the issue:

Extrator in the UI: graylog_4 2_date_extractor_custom_format

Message after send in via netcat: graylog_4 2_message_parsing_custom_format_failed

Further tests to isolate the issue: Now if I change the message that I send into Graylog slightly (just replacing the "T" with a space) to:

echo "2021-11-15 10:00:21.234 +0100"|nc -v localhost 5555

.. the value in gl2_processing_error field changes to: Value <2021-11-15 10:00:21.234 +0100> caused exception: Invalid format: "2021-11-15 10:00:21.234 +0100" is malformed at " +0100".

This tells me, that the extractor is not using my custom date_format .. it looks like its using the default format instead.

Now, my last test .. I change just the send in message to match the default format (by removing the timezone).

echo "2021-11-15 10:00:21.234"|nc -v localhost 5555

That way, the message is parsed correctly by the extractor. So, I come to the conclution the custom date_format of the extractor is ignored.

I'm sorry if my issue is a little different to the initial post here. But we both use a custom date_format and I somehow guess both issues are related, so I decided to append my details here. But feel free to move my post to a new issue (or just let me know, and I'll do so on my own).

Please let me know if further details are needed.

Thanks in advance and Kind regards Jens

mpfz0r commented 2 years ago

Thanks for the report. This is a regression in 4.2 and will be fixed with the next bugfix release.

In 4.2 we made some changes, to ensure that a message has a valid timestamp throughout the processing chain. While this helps to expose processing errors, that were hidden before that release it also broke most date converters being used in extractors.

ngvirus commented 2 years ago

Thanks for the report. This is a regression in 4.2 and will be fixed with the next bugfix release.

In 4.2 we made some changes, to ensure that a message has a valid timestamp throughout the processing chain. While this helps to expose processing errors, that were hidden before that release it also broke most date converters being used in extractors.

Thanks for the update!

merttokgozoglu commented 2 years ago

4.2.4+ released today and this issue still exists.

mpfz0r commented 2 years ago

@merttokgozoglu I'm gonna need more details to look into this. We definitely fixed the case for one extractor + date converter.

How is your extractor configuration? Which errors do you see? (maybe also append the server.log if it has exceptions)

jenlehma commented 2 years ago

We definitely fixed the case for one extractor + date converter.

I just want to let you know the fix for custom-date-converter is working on my end (tested with 4.2.3 and 4.2.4) .. so thanks for that.

merttokgozoglu commented 2 years ago

here is an example log line: machine01 apache: { "timestamp":"2021-12-17 11:25:15.001 +00:00", "remote_addr":"aaa.bbb.ccc.ddd", "body_bytes_sent":"1229", "request_time":"0", "response_status":"403", "request":"/wp-content/uploads/2020/02/xyz.pdf", "query":"", "request_method":"GET", "host":"example.com", "dst_port":"443", "src_port":"32192", "http_x_forwarded_for":"-", "http_referrer":"-", "http_user_agent":"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" },

my extractors works with these order

Get JSON from syslog message: regex_value: apache:\s+(.*)

Extract JSON fields: flatten: list_separator: , kv_separator: = key_prefix: keyseparator: replace_key_whitespace: key_whitespacereplacement:

Empty JSON field: replacement: - regex: .*

Reduced message to path: replacement: $1 regex: .request": "(.?)".*

with these extractor, Graylog 4.2.4+b643d2b behaves like this: image

with Graylog 4.0.2+1987d10, it works just fine with no errors: image

i also attached my altered content pack: altered.json

mpfz0r commented 2 years ago

@merttokgozoglu Thanks. This is actually not a bug. What people seemed to have missed, is that with 4.2 we introduced a feature that makes these kind of processing errors more visible. (see InvalidTimestampException at https://docs.graylog.org/docs/indexer-and-processing-failures )

Non-parseable dates on a message were always silently replaced with the current date. Since 4.2 we record these errors.

What needs to be done is to convert the timestamp field before assigning it to a message. Unfortunately JSON extractors don't support converters.

So in your case, I suggest something like this: Set up 3 extractors:

  1. replace with regex to rename the timestamp field in the raw json to something temporary. e.g temp_timestamp
  2. your json extractor
  3. a copy input extractor that transfers temp_timestamp over to timestamp and applying a date converter afterwards.

I will take a look if we can make this less complicated....

merttokgozoglu commented 2 years ago

alright, i get it now. i did not add any extractor but i modified the log format from the source that sends the log. changed "timestamp" field as "apache_timestamp" and it doesn't collide with the actual one that graylog knows.

it logs smootly now. thank you.

image

flightlesstux commented 2 years ago

Hi,

I found a workaround for this timestamp mismatch problem. I've set a temp_timetamp for which it comes with a log in Grok pattern.

Here is my Grok pattern starts like \[%{TIMESTAMP_ISO8601:temp_timestamp}\]

And I put a pipeline rule for my logs. I've added these lines:

  set_field("timestamp", to_date(concat(to_string(newFields.temp_timestamp), ".000")));
  remove_field("temp_timestamp");