bgreenlee / pygtail

Pygtail reads log file lines that have not been read. It will even handle log files that have been rotated. Based on logcheck's logtail2 (http://logcheck.org)
GNU General Public License v2.0
251 stars 79 forks source link

Bug: `_update_offset_file` should use `filehandle()` and not `filename` #38

Closed bobtiernay-okta closed 2 years ago

bobtiernay-okta commented 7 years ago

_update_offset_file currently uses the following assignment for inode calculation:

inode = stat(self.filename).st_ino

However, this should be:

inode = fstat(self._filehandle().fileno()).st_ino

This covers the case in next when we are not at the end of a file:

        if self.paranoid:
            self._update_offset_file()
        elif self.every_n and self.every_n <= self._since_update:
            self._update_offset_file()

Without it, we may be processing _rotated_logfile or a renamed file and incorrectly associate the current inode.

bobtiernay-okta commented 7 years ago

@bgreenlee Let me know if you agree with the change and I'll be happy to submit a PR. Cheers!

arekm commented 6 years ago

Hm, I just hit the problem where *.offset file contains new inode number but old (big) offset, for example:

# ls -li /var/log/maillog*
83922970 -rw-r----- 1 root logs 6889343 04-06 07:20 /var/log/maillog
84159524 -rw-r--r-- 1 root root      19 04-06 07:20 /var/log/maillog.offset

# maillog: crtime is when file was created on xfs filesystem
# xfs_db -r -c "inode 83922970" -c "p v3.crtime" /dev/md1
v3.crtime.sec = Fri Apr  6 05:02:03 2018
v3.crtime.nsec = 150325550

# maillog.offset: crtime as above
# xfs_db -r -c "inode 84159524" -c "p v3.crtime" /dev/md1
v3.crtime.sec = Thu Apr  5 09:38:04 2018
v3.crtime.nsec = 281641177

# So maillog.offset was created 5 Apr, pygtail processed maillog
# then few times (I'm running script that uses pygtail from cron)

# at night logrotate rotated maillog file (putting old one in archive/ subdirectory
# so pygtail cannot find it and handle it). New maillog file was created by syslog.

# script that uses pygtail was still started from cron (every 2 minutes)

#  yet, offset file contains new inode number for maillog
# file (where maillog file was created on 6 Apr) BUT size is from
# old processing

# cat /var/log/maillog.offset 
83922970
200558289

And now new run of pygtail processes nothing (because pygtail cannot handle such case where offset in *.offset is bigger than current file; related to new test case https://github.com/bgreenlee/pygtail/pull/42).

Bad news is that I'm using pygtail with your proposed change included, so bad inode is put into offset file or bad offset gets written.

codergs commented 6 years ago

I am running few python scripts under supervisord which are emitting STDOUT/STDERR to separate log files with rotation (2 backups). I am using pygtail==0.7.0 to read newly written lines to these log files and emit to our internal metrics aggregator engine.

Recently, I met with a case with two of my deployments stopped seeing new lines. I checked the offset file, and came across below findings:

1st deployment showed:

*-stdout.log.offset
2383
104857608

2nd deployment showed:

*-stdout.log.offset
2320
104857607

Any clue what might be wrong here? I restarted my instances, and things worked like before.