elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
78 stars 3.5k forks source link

Logstash open_file OPEN_WARN_INTERVAL #10499

Closed KarnGusain closed 5 years ago

KarnGusain commented 5 years ago

I'm facing until recently a very peculiar warning on the logstash log file [filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300', i've searched around the web for every possible help to dig it myself but could not found any thread providing a solution in turn i though to post it here with complete details:

I had already opened a questions on our forum but did not get anything.

Details:

- Version: 6.5.4
- Operating System: CentOS Linux release 7.4.1708 (Core)
- Config File (if you have sensitive info, please remove it):
- Sample Data:
- Steps to Reproduce:

LOG FILE: /var/log/logstash/logstash-plain.log

[2019-02-18T04:04:07,946][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'
[2019-02-18T04:04:07,946][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'
[2019-02-18T04:04:07,946][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'

AND STATUS:

[root@sj-logstash ~]# systemctl status logstash -l
● logstash.service - logstash
   Loaded: loaded (/etc/systemd/system/logstash.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/logstash.service.d
           └─logstashlimit.conf
   Active: active (running) since Sun 2019-02-17 11:10:07 PST; 16h ago
 Main PID: 24558 (java)
   CGroup: /system.slice/logstash.service
           └─24558 /bin/java -Xms15g -Xmx15g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djruby.compile.invokedynamic=true -Djruby.jit.threshold=0 -XX:+HeapDumpOnOutOfMemoryError -Djava.security.egd=file:/dev/urandom -cp /usr/share/logstash/logstash-core/lib/jars/animal-sniffer-annotations-1.14.jar:/usr/share/logstash/logstash-core/lib/jars/commons-codec-1.11.jar:/usr/share/logstash/logstash-core/lib/jars/commons-compiler-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/error_prone_annotations-2.0.18.jar:/usr/share/logstash/logstash-core/lib/jars/google-java-format-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/gradle-license-report-0.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/guava-22.0.jar:/usr/share/logstash/logstash-core/lib/jars/j2objc-annotations-1.1.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-annotations-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-core-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-databind-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/jackson-dataformat-cbor-2.9.5.jar:/usr/share/logstash/logstash-core/lib/jars/janino-3.0.8.jar:/usr/share/logstash/logstash-core/lib/jars/jruby-complete-9.1.13.0.jar:/usr/share/logstash/logstash-core/lib/jars/jsr305-1.3.9.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-api-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-core-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/log4j-slf4j-impl-2.9.1.jar:/usr/share/logstash/logstash-core/lib/jars/logstash-core.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.commands-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.contenttype-3.4.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.expressions-3.4.300.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.filesystem-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.jobs-3.5.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.resources-3.7.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.core.runtime-3.7.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.app-1.3.100.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.common-3.6.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.preferences-3.4.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.equinox.registry-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.jdt.core-3.10.0.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.osgi-3.7.1.jar:/usr/share/logstash/logstash-core/lib/jars/org.eclipse.text-3.5.101.jar:/usr/share/logstash/logstash-core/lib/jars/slf4j-api-1.7.25.jar org.logstash.Logstash --path.settings /etc/logstash

Feb 18 04:01:40 sj-logstash logstash[24558]: [2019-02-18T04:01:40,279][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'
Feb 18 04:01:40 sj-logstash logstash[24558]: [2019-02-18T04:01:40,279][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'
Feb 18 04:01:40 sj-logstash logstash[24558]: [2019-02-18T04:01:40,279][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'
Feb 18 04:01:40 sj-logstash logstash[24558]: [2019-02-18T04:01:40,280][WARN ][filewatch.tailmode.handlers.createinitial] open_file OPEN_WARN_INTERVAL is '300'

LOGSTASH OPEN FILE SETTING:

I have Created below file in order to set the Open file limit for logstash ..

[root@sj-logstash ~]# cat /etc/systemd/system/logstash.service.d/logstashlimit.conf
[Service]
LimitNOFILE=65535 

Below File i saw already there i have just adjusted the limit:

[root@sj-logstash ~]# cat /etc/systemd/system/logstash.service
[Unit]
Description=logstash

[Service]
Type=simple
User=logstash
Group=logstash
# Load env vars from /etc/default/ and /etc/sysconfig/ if they exist.
# Prefixing the path with '-' makes it try to load, but if the file doesn't
# exist, it continues onward.
EnvironmentFile=-/etc/default/logstash
EnvironmentFile=-/etc/sysconfig/logstash
ExecStart=/usr/share/logstash/bin/logstash "--path.settings" "/etc/logstash"
Restart=always
WorkingDirectory=/
Nice=19
LimitNOFILE=65535 

[Install]
WantedBy=multi-user.target

SYSTEM LEVEL File Descriptor settings:

[root@sj-logstash ~]# cat /etc/sysctl.conf
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
vm.max_map_count = 262144

[root@sj-logstash ~]# cat /etc/security/limits.conf 

*               soft    nofile 1024
*               hard    nofile 65535 
root            soft    nofile 1024
root            hard    nofile 65535 

LOGSTASH JVM Settings:

[root@elasticS0104 logstash]# cat /etc/logstash/jvm.options
-Xms7g
-Xmx7g
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djruby.compile.invokedynamic=true
-Djruby.jit.threshold=0
-XX:+HeapDumpOnOutOfMemoryError
-Djava.security.egd=file:/dev/urandom

LOGSTASH INPUT Configurations:

[root@sj-logstash ~]# cd  /etc/logstash/conf.d/
[root@sj-logstash conf.d]# ls -l
total 8
-rw-r--r-- 1 root root  917 Feb 17 00:46 syslog.conf
-rw-r--r-- 1 root root 1003 Feb  7 02:14 rmlogs.conf

Logstash config for syslog:

[root@sj-logstash conf.d]# cat dpc-syslog.conf
input {
  file {
    path => [ "/data_elk/SYSTEMS/*.log" ]
    start_position => beginning
    sincedb_path => "/dev/null"
    max_open_files => 65535 
    type => "dpc-syslog"
  }
}

filter {
  if [type] == "dpc-syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp } %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "@version", "host", "path", "messages" ]
    }
    syslog_pri { }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
 }
}
}
output {
        if [type] == "dpc-syslog" {
        elasticsearch {
                hosts => "elasticS0101:9200"
                manage_template => false
                index => "dpc-syslog-%{+YYYY.MM.dd}"
                document_type => "messages"
  }
 }
}

Logstash config for other custom logs:

[root@sj-logstash conf.d]# cat rmlogs.conf
input {
  file {
    path => [ "/data_elk/rmlogs/*.txt" ]
    start_position => beginning
    sincedb_path => "/dev/null"
    max_open_files => 65535 
    type => "rmlog"
  }
}

filter {
  if [type] == "rmlog" {
    grok {
     match => { "message" => "%{HOSTNAME:Hostname},%{DATE:Date},%{HOUR:dt_h}:%{MINUTE:dt_m},%{NUMBER:duration}-%{WORD:hm},%{USER:User},%{USER:User_1} %{NUMBER:Pid} %{NUMBER:float} %{NUMBER:float} %{NUMBER:Num_1} %{NUMBER:Num_2} %{DATA} (?:%{HOUR:dt_h1}:|)(?:%{MINUTE:dt_m1}|) (?:%{HOUR:dt_h2}:|)(?:%{MINUTE:dt_m2}|)%{GREEDYDATA:CMD},%{GREEDYDATA:PWD_PATH}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      remove_field => [ "path","minute3","minute2","host","hour2","hour3","Num_1","Num_2","message" ]
      remove_tag => ["_grokparsefailure"]
   }
 }
}

output {
        if [type] == "rmlog" {
        elasticsearch {
                hosts => "elasticS0101:9200"
                manage_template => false
                index => "dpc-rmlog-%{+YYYY.MM.dd}"
  }
 }
}

LOGSTASH logstash.yml file:

# cat /etc/logstash/logstash.yml | grep -v "#"
path.data: /var/lib/logstash

path.logs: /var/log/logstash

Would appreciate any help.

guyboertje commented 5 years ago

This warning is logged infrequently, IIRC.

It is saying that if the file input reaches the open file limit (default 4095) you will get a warning that this limit has been reached every 300 seconds. This limit can be reached if your path glob discovers more than 4095 files (some users in the past have had millions) so we open a fixed amount of files for tailing. In order to give the other files a chance of being processed we have close_older which closes files (but does not abandon them) to make way for others. The limit reached warning is important because you might never otherwise know that some files are not being processed especially if you have a "dropbox" style mechanism in place.

You can quieten the filewatch.tailmode.handlers.createinitial logging sub-system only by adding an entry in the log4j config file, e.g.

logger.createinitial.name = filewatch.tailmode.handlers.createinitial
logger.createinitial.level = error

The logging API allows for different levels of logging for different components in LS.

KarnGusain commented 5 years ago

Thanks a mile for your take on this @guyboertje . However, is there a way to increase file input limit . Secondly, In case it reaching the open file limit the what would be the aggregate value to set for close_older like close_older => 100 .

Will adding the suggested entries in log4j will only suppress the Warning only?

guyboertje commented 5 years ago

Be aware that that warning is not saying you have reached the limit. It is saying that if you reached the limit you'll get a warning every 300 seconds - so as not to flood the logs.

max_open_files is the setting that sets the limit. Do you have more that 4095 discoverable files?

The log4j config I gave will suppress all logging at WARN, INFO and TRACE for that component only.

KarnGusain commented 5 years ago

@guyboertje , Yes i have more than 4095 discoverable files as i have 5k Servers which are throwing syslogs and logstash processing them So each time if there is log update on the syslog file then it will process them all , where as i have another logs rmlogs which again having some file may be 100 or 10000 based on that.

guyboertje commented 5 years ago

Ahh. Now you do not need to increase the max_open_files limit in order to process more than 4095 files. You should think of mx_open_files as a 'sliding window' N number of files wide. The close_older setting controls when the windows slides over by a few files. Each file is handled by a state machine and can be in one of 'watched', 'active', 'closed', 'ignored' and some other more transient states ('delayed_delete', 'rotation_in_progress') close_older will move file from the 'active' state to the 'closed' state when the file size has not changed in the close_older interval. The exact effect of this depends on whether the files discovered are actually being written to (like in the classic 'tailing' scenario) or being dropped in place (having a fixed size) by some script or rotation activity. If the current set of open files have some that are written to frequently and some that are infrequent then the file input favours the frequent ones and will close the infrequent ones. A file in the closed is still checked for changes and if detected it is put into the active state, meaning it becomes eligible for moving to the active state when there is capacity in the sliding window.

If you have tens of thousands of frequently updated files that need monitoring then you should consider using filebeat across the 5000 servers. We should talk more about this in Discuss

KarnGusain commented 5 years ago

Thanks again @guyboertje , i have already a thread there opened Discuss , lets talk more there.

guyboertje commented 5 years ago

Closing, because we successfully discussed this in Discuss