elastic / logstash-forwarder

An experiment to cut logs in preparation for processing elsewhere. Replaced by Filebeat: https://github.com/elastic/beats/tree/master/filebeat
Other
1.79k stars 414 forks source link

logstash-forwarder randomly stopping to send logs #162

Closed teu closed 9 years ago

teu commented 10 years ago

Hi,

have the problem that the forwarder is randomly stopping to send logs. File is still populated with logs but the forwarder decided it's an old file and doesn't check any more.

2014/02/08 16:49:46.750622 Loading client ssl certificate: /opt/lumberjack/lumberjack.crt and /opt/lumberjack/lumberjack.key
2014/02/08 16:49:46.750979 Starting harvester: -
2014/02/08 16:49:47.035138 Current file offset: 0
2014/02/08 16:49:46.871471 Skipping old file: /var/log/apache2/access.json
2014/02/08 16:49:47.990506 Setting trusted CA from file: /opt/lumberjack/lumberjack.crt
2014/02/08 16:49:48.012460 Connecting to 10.62.197.248:6781 (logstash.sherstondevelopments.co.uk)
2014/02/08 16:49:48.415488 Connected to 10.62.197.248
2014/02/09 16:49:53.700060 Stopping harvest of -; last change was %!d(float64=86406.632560624) seconds ago

Now restarting it works but is a bit of a pain as those are AWS instances in a cluster. All the instances from that cluster stopped, not sure if at the same time.

T0mK0 commented 10 years ago

it is not a bug . if the file is not changed for 24 hours the system will stop harvesting this file . Unfortunatly you need to change the code and recompile if you want to increase this 24 hours period ( IMHO ) .

teu commented 10 years ago

The problem is, there has been constant logs though the period without such a 24 hour pause.

T0mK0 commented 10 years ago

Hmm . I see the file name is '-' , means it is standard input ? If this is the case , I would try to use regular file(s) and see the results .

teu commented 10 years ago

OK, I removed the stdin from the config, it's not used anyway, just using the regular file, perhaps that is the problem.

teu commented 10 years ago

Hi again, now I can confirm that it wasn't the problem, again lumberjack stopped working, even though there is only 1 file and it has constant logs ( I know that because it's an AWS instance behind a load balancer that checks the instance every few seconds, which produces a log entry). So there is definitly a problem with lumberjack either just stopping or something with log rotation.

T0mK0 commented 10 years ago

Do you sync time with ntp ? Is it the same inode number for the file in question ?

teu commented 10 years ago

Hi, no I don't sync with ntp currently.

Same inode as what? Can you explain a bit?

T0mK0 commented 10 years ago
  1. About ntp: I think it is worth to run it ( maybe as periodic ntpdate ) . Virtual servers are known to have time problems .
  2. Start forwarder , ensure it is working . Note the inode of the file you harvest . When it will stop harvesting this file - note the inode again . Ensure it is the same inode number .
teu commented 10 years ago

OK Thanks,

will change ntp once I get puppet going on the instances (changing stuff by hand is a pitass). I have taken note of the inodes, now need to wait. It may be that the inodes change as log rotation will take place.

teu commented 10 years ago

OK, restarted the other day, on one of the 2 instances I run (the exact same copy) the file stopped steaming after a couple of hours.

2014/02/18 17:07:14.479850 Loading client ssl certificate: /opt/lumberjack/lumberjack.crt and /opt/lumberjack/lumberjack.key
2014/02/18 17:07:14.626918 Skipping old file: /var/log/apache2/access.json
2014/02/18 17:07:15.857158 Setting trusted CA from file: /opt/lumberjack/lumberjack.crt
2014/02/18 17:07:15.887220 Connecting to 10.62..xxx.xxx:6781 (logstash.xxx.co.uk) 
2014/02/18 17:07:17.761750 Connected to 10.62.xxx.xxx

The log entries before that are from 2014/02/12 so this means that the instance has been restarted and immediately after the restart lumberjack basically didn't start properly.

My upstart job, may be that it's the problem (wouldn't see though why it's not working only sometimes)?:

#
# luberjack upstart script
#
description "Lumberjack"

start on (virtual-filesystems and net-device-up)
stop on runlevel [!2345]

respawn
respawn limit 5 30

script
  exec /opt/logstash-forwarder/bin/logstash-forwarder -config /etc/lumberjack/lumberjack-apache-access.conf 2>> /var/log/lumberjack-access.log
end script

The iNode is exactly the same as it was when it was working.

In the log file before also getting some errors, but they didn't seam to stop the logging this time:

runtime: panic before malloc heap initialized
fatal error: runtime: cannot allocate heap metadata
billzhuang commented 10 years ago

i meet the same problem lots of time, so i have to write a cron job to restart the service everyday.

scakkia commented 10 years ago

The same here with logstash-forwarder 0.3.1 debian amd64. I use with a log file of about 5M log/day (2GB/day). logstash-forwarder stop to send log silently. So the only solution is to restart with a cron job.

tpiha commented 9 years ago

I can confirm this problem. I'm monitoring files and logstash forwarder just stops sending events in a random amount of time. It's working normally on one server which sends only small amount of data, but on those with larger amounts of data, it stops sending after some time without any message on logstash forwarder or logstash side.

This is a really big problem for us.

evalencia commented 9 years ago

Im seeing the same issues if the log isn't updated in 24 hours it stops sending events to logstash. This log is randomly written to at times but still need to get this:

2015/02/06 03:13:20.740771 Stopping harvest of /var/log/nodejs/nodejs.log; last change was %!d(float64=86405.153875113) seconds ago

kikicarbonell commented 9 years ago

:+1:

phandinhloc commented 9 years ago

Well, we have the same problem. logstash-forwarder just stop sending logs until we restart it.

rakslice commented 9 years ago

"This is not a bug". Help me understand: What kind of log monitoring has a requirement for stopping monitoring the log file after a period of time?

jordansissel commented 9 years ago

Maybe there's been some misunderstanding. It should be still checking for those files to be modified later even if it closed the file. The "close if file is idle" is an optimization to allow resource cleanup. If the file is modified later, it should be reopened. If not, this is a bug.

On Wednesday, May 20, 2015, rakslice notifications@github.com wrote:

"This is not a bug" What requirement of log monitoring calls for stopping monitoring the log file after a period of time !?

— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash-forwarder/issues/162#issuecomment-104025807 .

JDiPierro commented 9 years ago

I haven't encountered this bug in a while because I created a cron job to restart the forwarder every night at midnight.. but I believe it wasn't re-opening the file if it was modified later. Someone currently encountering the bug should verify this.

driskell commented 9 years ago

Ideally provide new logs if yo have issues like this. I think in the OP case there's not enough information, and a mixup between "Stooping harvest of..." and "Skipping old file..." The former happens 24h (configurable now I think) after the last change to the open file. The latter happens for new files if they've not been touched for 24h - usually during startup only if there's a previously unknown file there which is old. Can also appear if you move/copy in an old file. Just stops unnecessary shipping since its expected to ship realtime.

jordansissel commented 9 years ago

I think we need more information. Upon restart, what files are not reopened? Can you attach your logstash-forwarder config?

jordansissel commented 9 years ago

No updates since March. If this is still an issue, let's open a new ticket and we can work together on a fix :)