Closed ngvirus closed 2 years ago
Any difference if you change (?<message>(.|\r|\n)*)$
to (?s)(?<message>.*)$
(Pattern.DOTALL)? I'm guessing the stack overflow is happening due to recursion associated w/ that inner capturing group.
Any difference if you change
(?<message>(.|\r|\n)*)$
to(?s)(?<message>.*)$
(Pattern.DOTALL)? I'm guessing the stack overflow is happening due to recursion associated w/ that inner capturing group.
Hello, This helped for the exception, but still the new error persists:
gl2_processing_error: Value <2021-10-20 09:05:39,892> caused exception: Invalid format: "2021-10-20 09:05:39,892" is malformed at ",892".
Good to hear. The invalid format error is due to you trying to set the timestamp
field to something not matching yyyy-MM-dd HH:mm:ss.SSS
(i.e. comma instead of period before milliseconds; also assumed to be UTC, just in case yours isn't).
Good to hear. The invalid format error is due to you trying to set the
timestamp
field to something not matchingyyyy-MM-dd HH:mm:ss.SSS
(i.e. comma instead of period before milliseconds; also assumed to be UTC, just in case yours isn't).
okay, but before 4.2 this was working fine. is this a change from 4.2?
I confirm this behaviour in the Regular Expression extractor after upgrade from 4.1 to 4.2: Prior to 4.2, it worked fine. And, yes, we need to have a comma after seconds, not a period.
I confirm this behaviour in the Regular Expression extractor after upgrade from 4.1 to 4.2:
I approached this with 2 extractors: 1) the 1st one (of type Replace with Regular Expression) replaces a comma in the timestamp with a period 2) the 2nd one (of type Regular Expression) performs date conversion and stores the result in the timestamp
The 1st one:
The 2nd one:
is there any update for this ?
Hello all,
I stumpled across this issue after performing an upgrade to version 4.2.1 and suddenly my extrator with a custom date_format stopped working. Now I took the time to build a small example to reproduce the issue I was facing, so I can easily test it on different Graylog versions. I tested this on version 3.1.2 (works correctly), 4.1.7 (works correctly) and 4.2.1 (which fails).
Basis details: In our extractor we had to convert a custom date-format into a internal representation to get the correct value into the timestamp field. The part of this timestamp looks like this ...
2021-11-15T10:00:21.234 +0100
As you can see, there is a "T" between date- and time-parts and additionally the timezone should be taken into account as well. In our case the milli-seconds are separated by a "." .. (which is different from the OPs example, but we still use a custom format to parse the value)
After an update to version 4.2.1 the extractor writes the following into gl2_processing_error field:
Value <2021-11-15T10:00:21.234 +0100> caused exception: Invalid format: "2021-11-15T10:00:21.234 +0100" is malformed at "T10:00:21.234 +0100".
How to reproduce the issue:
yyyy-MM-dd'T'HH:mm:ss.SSS Z
).
echo "2021-11-15T10:00:21.234 +0100"|nc -v localhost 5555
Extrator in the UI:
Message after send in via netcat:
Further tests to isolate the issue: Now if I change the message that I send into Graylog slightly (just replacing the "T" with a space) to:
echo "2021-11-15 10:00:21.234 +0100"|nc -v localhost 5555
.. the value in gl2_processing_error field changes to:
Value <2021-11-15 10:00:21.234 +0100> caused exception: Invalid format: "2021-11-15 10:00:21.234 +0100" is malformed at " +0100".
This tells me, that the extractor is not using my custom date_format .. it looks like its using the default format instead.
Now, my last test .. I change just the send in message to match the default format (by removing the timezone).
echo "2021-11-15 10:00:21.234"|nc -v localhost 5555
That way, the message is parsed correctly by the extractor. So, I come to the conclution the custom date_format of the extractor is ignored.
I'm sorry if my issue is a little different to the initial post here. But we both use a custom date_format and I somehow guess both issues are related, so I decided to append my details here. But feel free to move my post to a new issue (or just let me know, and I'll do so on my own).
Please let me know if further details are needed.
Thanks in advance and Kind regards Jens
Thanks for the report. This is a regression in 4.2 and will be fixed with the next bugfix release.
In 4.2 we made some changes, to ensure that a message has a valid timestamp throughout the processing chain. While this helps to expose processing errors, that were hidden before that release it also broke most date converters being used in extractors.
Thanks for the report. This is a regression in 4.2 and will be fixed with the next bugfix release.
In 4.2 we made some changes, to ensure that a message has a valid timestamp throughout the processing chain. While this helps to expose processing errors, that were hidden before that release it also broke most date converters being used in extractors.
Thanks for the update!
4.2.4+ released today and this issue still exists.
@merttokgozoglu I'm gonna need more details to look into this. We definitely fixed the case for one extractor + date converter.
How is your extractor configuration? Which errors do you see? (maybe also append the server.log if it has exceptions)
We definitely fixed the case for one extractor + date converter.
I just want to let you know the fix for custom-date-converter is working on my end (tested with 4.2.3 and 4.2.4) .. so thanks for that.
here is an example log line: machine01 apache: { "timestamp":"2021-12-17 11:25:15.001 +00:00", "remote_addr":"aaa.bbb.ccc.ddd", "body_bytes_sent":"1229", "request_time":"0", "response_status":"403", "request":"/wp-content/uploads/2020/02/xyz.pdf", "query":"", "request_method":"GET", "host":"example.com", "dst_port":"443", "src_port":"32192", "http_x_forwarded_for":"-", "http_referrer":"-", "http_user_agent":"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" },
my extractors works with these order
Get JSON from syslog message: regex_value: apache:\s+(.*)
Extract JSON fields: flatten: list_separator: , kv_separator: = key_prefix: keyseparator: replace_key_whitespace: key_whitespacereplacement:
Empty JSON field: replacement: - regex: .*
Reduced message to path: replacement: $1 regex: .request": "(.?)".*
with these extractor, Graylog 4.2.4+b643d2b behaves like this:
with Graylog 4.0.2+1987d10, it works just fine with no errors:
i also attached my altered content pack: altered.json
@merttokgozoglu Thanks. This is actually not a bug.
What people seemed to have missed, is that with 4.2 we introduced a feature that makes these kind of processing errors
more visible. (see InvalidTimestampException
at https://docs.graylog.org/docs/indexer-and-processing-failures )
Non-parseable dates on a message were always silently replaced with the current date. Since 4.2 we record these errors.
What needs to be done is to convert the timestamp
field before assigning it to a message.
Unfortunately JSON extractors don't support converters.
So in your case, I suggest something like this: Set up 3 extractors:
replace with regex
to rename the timestamp field in the raw json to something temporary. e.g temp_timestamp
temp_timestamp
over to timestamp
and applying a date converter afterwards.I will take a look if we can make this less complicated....
alright, i get it now. i did not add any extractor but i modified the log format from the source that sends the log. changed "timestamp" field as "apache_timestamp" and it doesn't collide with the actual one that graylog knows.
it logs smootly now. thank you.
Hi,
I found a workaround for this timestamp mismatch problem. I've set a temp_timetamp
for which it comes with a log in Grok pattern.
Here is my Grok pattern starts like \[%{TIMESTAMP_ISO8601:temp_timestamp}\]
And I put a pipeline rule for my logs. I've added these lines:
set_field("timestamp", to_date(concat(to_string(newFields.temp_timestamp), ".000")));
remove_field("temp_timestamp");
Graylog-server version: 4.2.0-3 OS: Oracle Linux 8 Java: 1.8.0-openjdk-headless-1.8.0.302.b08-0.el8_4.x86_64 I just upgraded to this version of Graylog and in my processed events i see this error:
gl2_processing_error: Value <2021-10-20 09:05:39,892> caused exception: Invalid format: "2021-10-20 09:05:39,892" is malformed at ",892".
I have a very basic GROK extractor that works fine:
\[%{DATA:timestamp}\]\s\[%{DATA:loglevel}\]\s\[%{DATA:fieldname2}\]\s\[%{DATA:fieldname3}\]\s\[%{DATA:fieldname4}\]\s-\s(?<message>(.|\r|\n)*)$
In the server log i see this warning which is like 850 lines long repeating the same 6 lines over and over: 2021-10-20T13:13:28.286Z WARN [ProcessBuffer] Unable to process event MessageEvent{raw=null, message=null, messages=null}, sequence 896092 java.lang.StackOverflowError: null at java.util.regex.Pattern$GroupTail.match(Pattern.java:4731) ~[?:1.8.0_302] at java.util.regex.Pattern$BranchConn.match(Pattern.java:4582) ~[?:1.8.0_302] at java.util.regex.Pattern$CharProperty.match(Pattern.java:3791) ~[?:1.8.0_302] at java.util.regex.Pattern$Branch.match(Pattern.java:4618) ~[?:1.8.0_302] at java.util.regex.Pattern$GroupHead.match(Pattern.java:4672) ~[?:1.8.0_302] at java.util.regex.Pattern$Loop.match(Pattern.java:4799) ~[?:1.8.0_302]This is how a message looks like.
[2021-10-20 08:56:06,420] [INFO] [field1] [field2] [field3] - rest of the message