driskell / log-courier

The Log Courier Suite is a set of lightweight tools created to ship and process log files speedily and securely, with low resource usage, to Elasticsearch or Logstash instances.
Other
419 stars 107 forks source link

log-courier does not release the rotated logs if they were rotated while logstash server was down/unreacheable #152

Closed gfa closed 9 years ago

gfa commented 9 years ago

ufff. what a title

here the problem i've found. sometimes my logstash-server gets unreachable, due network problems or logstash-server itself. if log files (on the client side) get rotated during that period. logcourier will not release the files when reconnect to logstash-server, making my /var/log full and ops people unhappy

it happened for some time now, i've upgraded to 1.6 and still happens (i have more logs, and info saved, just pasteing what i think is relevant)

log-couri 19522 root   24r   REG        8,1     121405 269140 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   25r   REG        8,1      77190 269183 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   26r   REG        8,1        145 269177 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   30r   REG        8,1     105268 269180 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   31r   REG        8,1      76667 269134 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   32r   REG        8,1        145 269129 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   34r   REG        8,1        145 269131 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   37r   REG        8,1      62714 269165 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   38r   REG        8,1        145 269132 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   40r   REG        8,1      70253 269168 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   41r   REG        8,1        145 269167 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   43r   REG        8,1      75414 269174 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   44r   REG        8,1        145 269138 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   46r   REG        8,1        145 269171 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   48r   REG        8,1     156689 269187 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   49r   REG        8,1        145 269186 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   51r   REG        8,1     153985 269199 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   52r   REG        8,1     163106 269189 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   53r   REG        8,1        145 269185 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   62r   REG        8,1     212873 269197 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   63r   REG        8,1     134918 269192 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   64r   REG        8,1        145 269188 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   67r   REG        8,1     185452 269195 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   68r   REG        8,1        145 269193 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   70r   REG        8,1        145 269163 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   72r   REG        8,1        145 269196 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   75r   REG        8,1     243385 269201 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   76r   REG        8,1        145 269200 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   79r   REG        8,1     260153 269203 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   80r   REG        8,1        145 269202 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   82r   REG        8,1     262409 269205 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   83r   REG        8,1        145 269204 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   91r   REG        8,1          0 268990 /var/log/openvswitch/ovs-ctl.log
log-couri 19522 root   92r   REG        8,1     247350 269207 /var/log/openvswitch/ovs-vswitchd.log.1 (deleted)
log-couri 19522 root   93r   REG        8,1        145 269178 /var/log/openvswitch/ovsdb-server.log.1 (deleted)
log-couri 19522 root   96r   REG        8,1      77809 269212 /var/log/openvswitch/ovs-vswitchd.log
log-couri 19522 root   97r   REG        8,1     247620 269210 /var/log/openvswitch/ovs-vswitchd.log.1
log-couri 19522 root   98r   REG        8,1        145 269198 /var/log/openvswitch/ovsdb-server.log.1
log-couri 19522 root  100r   REG        8,1         91 269208 /var/log/openvswitch/ovsdb-server.log

log-courier's log when the files were rotated

2015/04/14 00:29:48.781946 Launching harvester on rotated file: /var/log/openvswitch/ovs-ctl.log
2015/04/14 00:29:48.781998 Launching harvester on rotated file: /var/log/openvswitch/ovs-vswitchd.log
2015/04/14 00:29:48.782025 Launching harvester on rotated file: /var/log/openvswitch/ovsdb-server.log
2015/04/14 00:29:48.782310 Launching harvester on rotated file: /var/log/haproxy.log
2015/04/14 00:29:48.782579 Started harvester at position 0 (requested 0): /var/log/openvswitch/ovs-ctl.log
2015/04/14 00:29:48.782618 Started harvester at position 0 (requested 0): /var/log/openvswitch/ovs-vswitchd.log
2015/04/14 00:29:48.782701 Started harvester at position 0 (requested 0): /var/log/openvswitch/ovsdb-server.log
2015/04/14 00:29:48.782745 Started harvester at position 0 (requested 0): /var/log/haproxy.log
2015/04/14 00:29:55.095674 Transport init failed: TLS Handshake failure with logstash-server:60044: read tcp logstash-server:60044: connection reset by peer
2015/04/14 00:29:56.095854 Attempting to connect to logstash-server:60044

log-courier's log when it reconnected to logstash-server

2015/04/14 11:13:17.727349 Attempting to connect to logstash-server:60044
2015/04/14 11:13:18.823342 Connected to logstash-server:60044
2015/04/14 11:13:28.852332 Unexpected file truncation, seeking to beginning: /var/log/keystone/keystone.log

i have setup "dead time" and "persist directory" shouldn't deltas of what can't be send stored on "persist directory" ? that would be good

"general": {
        "admin listen address": "tcp:127.0.0.1:1234",
        "persist directory": "/data/logstash-client",
        "admin enabled": true },

"network": {
    "servers": [ "logstash-server:60044" ],
    "ssl certificate": "/etc/logstash/selfsigned.crt",
    "ssl ca": "/etc/logstash/selfsigned.crt",
    "ssl key": "/etc/logstash/selfsigned.key",
    "timeout": 30},
  "files": [
     { "dead time":"1h" },
      {"paths": ["/var/log/nova/nova-api.log",
         "/var/log/upstart/nova-api.log" ],
       "fields": {"type": "nova", "tags": [ "oslofmt", "nova-api" ]}},

[snip]

thanks, awesome software

driskell commented 9 years ago

Hello,

Your dead time configuration is in the wrong place I think.

[snip]
"files": [
     { "dead time":"1h" },
      {"paths": ["/var/log/nova/nova-api.log",
         "/var/log/upstart/nova-api.log" ],
       "fields": {"type": "nova", "tags": [ "oslofmt", "nova-api" ]}},
[snip]

The dead time should be within the configuration for a file. I will look into why this passed configuration validation as it should have failed. The block with a dead time contains no path entry (so it is invalid).

It should be:

[snip]
"files": [
      {"paths": ["/var/log/nova/nova-api.log",
         "/var/log/upstart/nova-api.log" ],
         "dead time":"1h",
       "fields": {"type": "nova", "tags": [ "oslofmt", "nova-api" ]}},
[snip]

The behaviour is as designed though. If we close the log on rotation, and haven't sent the logs contents to the Logstash server, the contents will be lost, and this would break the at-least-once delivery guarantee that Log Courier provides. Maybe there's a way to handle problematic situations like this where the existing behaviour can lead to disk filling up.

Suggestions are welcome - I'm thinking along the lines of a "delete hold max" or something that says, only keep a deleted file open for this long before giving up and reporting an ERROR to log file that logs were lost. For the majority of situations the transmission will have completed by the time rotation occurs so should not impact anyone else.

driskell commented 9 years ago

OK I confirm the bug where you can specify a files entry with no path! I'll get this fixed soon for 1.7 maintenance release.

driskell commented 9 years ago

I hate the invalid label, sorry :X gonna rename it to be less patronising :-1:

gfa commented 9 years ago

ouch! same issue again /me not parsing the docs appropriately :(

driskell commented 9 years ago

@gfa no worries! Doesn't help when the daft developer doesn't validate the config properly :) (that's me, doh!)