Closed kvborodin closed 5 years ago
Same time, Logstash has no issues consuming my 2 log files with the same regex This make me think current v.1.x of Fluentd is not production ready at all I'm going to switch to Logstash for all my projects, guys You should remove "production-ready" from your official site Thanks
Hmm... to debug this, we need your actual log for the investigation. logstash uses java based regexp engine and ruby(cruby) uses other regexp engine. So this seems to cause the different behaviour.
Cool! I'll send you privately (repeatedly@gmail.com) when I got approve from my PM/TL
What about SIGCONT sig during stuck? Any thoughts why it's always works but not in my situation?
Any thoughts why it's always works but not in my situation?
This is log content / regexp combo issue or your regexp hits the bug of ruby's regexp engine. Each regexp engine has each algorithm/pattern and it sometimes causes this issue with specific pattern. For example, java's regexp also has a similar problem: https://stackoverflow.com/questions/37484078/regexp-causes-to-hang-infinitely
@repeatedly look, Logstash have GROK. GROK have timeouts on each regex matching operation which can be tuned. Logstash also log this unmatched/timeouted lines to separate log-file for the next investigation and next updating regex which is fails during processing. Without this ability you can't develop logs parsing matches, right? I mean, how can I understand which string from my multi format log made Fluentd stuck?
I just suggest to make improvements and add timeouts & logging for those users who is using regex patterns!
Also, the problem is with hanging Fluentd with infinity regex loops, this improvement which I suggest, can solve all this problems with hanging/stuck processing input records!
I was talking about this feature: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-timeout_millis
also please take a look on this features:
They are so helpful!
Thanks!
grok is the collection of regexp so I assume logstash also has similar regexp problem. The ruby's thread model is different from java but I will check we can use similar handling for regexp.
Wrote patch for this issue: https://github.com/fluent/fluentd/pull/2513
First of all I'd like to thanks to all developers who made this piece of software works at least 23 hours in production without any issues. My issue is happened after 24 hours.
Describe the bug I'm using 'multi_format' plugin inside 'tail' source, which is made by Mr @repeatedly
My "parse".."/parse" code inside 'tail' plugin looks like this structure, I have 10-15 various expressions to catch my multi-format logs:
Every day I'm having production stuck/hang fluentd process which consumes 100% cpu But it's not just about consuming high CPU, fluentd just stops consuming/tailing my logs same time, which is horrible for my 200 production bare-metal servers, you know. I have only 2 log files at the same time, just 2 files which I want this software to consume:
When Fluentd hang/stuck I am not able to get report by sending SIGCONT signal to PID Strace shows me same data even if I strace well-working fluentd or stuked:
But pstack shows me interesting information every time even if I stop/start fluentd:
look 0 and 1 lines
Which make me think, Fluentd has no ability to detect infinity regexp loops, which is basically handled by https://rubular.com/ for example.
To Reproduce I can privately send you my config and log with unknown log-string which makes fluentd hangs
Expected behavior Warn to logfile and continue processing logs. Warning must include entire log-string like you do with unmatched strings. There should be ability to match this logs and apply another regexp (i.e. retry to process more simpler regexp) Your Environment
# rpm -qa | grep td-agent td-agent-3.4.1-0.el7.x86_64
And tested with 1.4 and 1.5 Fluentd (and other versions till 1.0) - same issue I don't know why old version ruby 2.4.6 is using in production, when its EOL:
# /opt/td-agent/embedded/bin/ruby -v ruby 2.4.6p354 (2019-04-01 revision 67394) [x86_64-linux]
Operating system:
# cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)"
Kernel version:
# uname -r 4.17.8-1.el7.elrepo.x86_64
Your Configuration
Your Error Log