bpaquet / node-logstash

Simple logstash implmentation in nodejs : file log collection, sent with zeromq
Other
517 stars 141 forks source link

File input plugin (without tail) seems to stop reading input files after some time #131

Closed jerome83136 closed 2 years ago

jerome83136 commented 8 years ago

Hello,

I am using the file input plugin without the "tail" mode as you recommended it.

After some time, it seems my input files are not read anymore (I'm not sure but it seems to happen after ~ 1hour) I don't understand it because these files are never stale. There always are new lines written in these files (Apache access_logs), but it seems node-logstash "detach" from them after some time.

I have tried to use the tail mode and with it, my files are read fine even after they get truncated bu the Apache logs rotation. But i would prefer to use the input plugin without tail because tail does not start reading my files from the beginning. (event if I use start_index => 0)

I suspect node-logstash to start reading files from the beginning but to stop reading the file after the last line of the file at the node-logstash startup.

Example: My log file is 10 lines long. I start node-logstash which reads from the 1st line to the line number 10 and stops, Even if a new line number 11 gets reated after node-logstash startup.

What do you think about this ? Any idea about how I can get it to work ?

Here is a part of my config file:

input {
 file {
  #use_tail => false
  start_index => 0
  path => '/central_logs/input/prod/webservers/webserver?/apache/myapp1/access_domain?_log'
  add_field => { "application" => "myapp1" }
 }
 file {
  #use_tail => false
  start_index => 0
  path => '/central_logs/input/prod/webservers/webserver?/apache/myapp2/access_domain?_log'
  add_field => { "application" => "myapp2" }
 }
}

Please notice that I start node-logstash with this parameter: --db_file=/var/tmp/logs.node-logstash.myapps.dbfile

Before restarting node-logstash I delete this file to ensure the input files are read from the beginning.

Thank you for your help

Jérôme

jerome83136 commented 8 years ago

Hi,

I suspect node-logstash to start reading files from the beginning but to stop reading the file after the last line of the file at the node-logstash startup.

Example: My log file is 10 lines long. I start node-logstash which reads from the 1st line to the line number 10 and stops, Even if a new line number 11 gets reated after node-logstash startup.

--> verified

This is what happens. I have started node-logstash (.dbfile deleted) and the last lines of logs were time-stamped 2016/06/23 09h50

The output files written by node-logstash (file output plugin) are filled up with the content of the input files from their beginning and the last lines are time-stamped 2016/06/23 10h50

I have another instance of node-logstash; which reads the same input files and send them to Elasticsearch and in my index I also have no logs after 2016/06/23 10h50

Thanks for your help

Jérôme

bpaquet commented 8 years ago

I'm not able to reproduce any problem :(

My test config :

input {
 file {
  #use_tail => false
  start_index => 0
  path => 'toto.log'
 }
}

output {
 stdout {
 codec => json
 }
 file {
  path => out.log
 }
}

With or without start_index, with or without db_file, all seems to be OK. I run test on a Linux Debian 7, node 4.1.1

I also did some test with Apache2 log rotation (to be exact with logrotate config deployed by the debian package) : all seems to works as expected.

Can you provide more details

jerome83136 commented 8 years ago

Hello,

Thank you for your investigations.

I'm using CentOS 7.0 x86_64 node-logstash version: 0.0.5

I use that command line: /products/node-logstash/bin/node-logstash-agent --log_level=error --config_file=/conf/logstash/logstash.app-es.node.conf --log_file=/logs/logstash/logs.node-logstash.app-es.log --db_file=/var/tmp/logs.node-logstash.app-es.dbfile

input {
 file {
  #use_tail => true
  start_index => 0
  path => '/central_logs/input/prod/webservers/webserver1/apache/app/access_FH?_log'
  add_field => { "application" => "app" }
 }
 file {
  #use_tail => true
  start_index => 0
  path => '/central_logs/input/prod/webservers/webserver1/apache/app2/access_*MALE_log'
  add_field => { "application" => "app2" }
 }
 file {
  #use_tail => true
  start_index => 0
  path => '/central_logs/input/prod/webservers/webserver2/apache/app/access_FH?_log'
  add_field => { "application" => "app" }
 }
 file {
  #use_tail => true
  start_index => 0
  path => '/central_logs/input/prod/webservers/webserver2/apache/app2/access_*MALE_log'
  add_field => { "application" => "app2" }
 }
}
filter {
 regex {
  regex => /([0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3})\s(.*)\s(.*)\s\[.*\]\s\"([A-Z]*)\s\/(webshop\/.*)\sHTTP\/[0-9].[0-9]\"\s{1,2}([0-9]{3})\s([0-9]*|-)\s\|(.*)\|\s\{(.*)\}\s([0-9]*|-)\s(zp2web0[0-9]|-)\s\+(lt[m,f][a,b,c][0-9]*xz[0-9]*wty|-)\_\_[-,1,0]\+\s\(([A-Z0-9]*\.lt[m,f][a,b,c][0-9]*xz[0-9]*wty|-)\)\s\<(.*)\>/
  fields => [clientip, user_http, user_app, timestamp, method, request, http_code, http_lenght, referer, user_agent, http_time, webserver, jvm, jsessionid, ssl_version]
  numerical_fields => [http_code, http_lenght, http_time]
  date_format => ['dd/MMM/yyyy:HH:mm:ss ZZ']
 }

 #Getting GeoIP information for the event
 geoip {
  field => clientip
  cache_size => 1000
 }
}

output {
  elasticsearch {
   host => ladmlogs1
   port => 9101
   index_prefix => applicationstst
   bulk_limit => 100
   bulk_timeout => 100
  }
}

What happens: My logs are read from the beginning of the input files and node-logstash stops reading the input files after some time. It seems node-logstash stops reading after the last line of the input file; when node-logstash has been started.

What I expect: I would like node-logstash to start reading from the beginning (because I delete the .dbfile) and to continue reading the entire file until the log rotation, and the next created file after the truncate..

Thanks for your help

Jérôme