Open pbrzica opened 3 months ago
Hello, thanks for raising this, looks like a memory leak
re: Error matching stream rule <646b7aa40063d45f4807fe8a> <REGEX/^prod-logging> for stream Random Stream Name
Is the associated stream receiving messages from the rabbitMQ input?
Hi, just checked the logs.
The associated stream is receiving logs from RabbitMQ but we start getting these errors on all of our streams, including ones using GELF TCP inputs (above was just an example). I am guessing as memory gets lower it happens more and more until it finally runs out. If it can help, almost all of our streams (76 of them, excluding the system ones) use regex on the stream rules. Mainly 2 or 3 rules in the style of:
source: ^prod-
or
channel: ^service$
I can try setting up some more Graylog metrics if you think they'd be helpful (just need info if you have any specific ones in mind).
I'll also try stopping/starting the inputs after some time to see if that can maybe help.
Just noticed #19629 I don't know if anything specific uses it but we will update to 6.0.4 today and report back.
Reporting back. Graylog has been up for 11 days without any issues on version 6.0.4
Thanks for the feedback, @pbrzica. Very much appreciated!
We have noticed since upgrading to major version 6 from 5 a new issue. Every couple of days or so Graylog first starts logging the following:
As time goes by more and more of these logs appear and then everything starts crashing with Java heap space errors. Example:
We receive the most logs from our RabbitMQ input and we average around 3-4k messages per second per node. I've tried increasing the heap and decreasing the number of processors but nothing seems to help.
I've also attached load and memory graphs (at around 5:00 is when Graylog completely stops working, but prior to that load and memory usage is completely normal)
Expected Behavior
Graylog doesn't run out of heap space
Current Behavior
Graylog works fine for some time and then at random intervals starts crashing due to heap errors
Possible Solution
I've noticed that increasing the heap increases the duration that Graylog stays healthy so is it possible this is a memory leak some where?
Steps to Reproduce (for bugs)
Context
Self explanatory
Your Environment
Using the official Graylog docker image