elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.2k stars 3.5k forks source link

Why is Logstash so slow to start? #5507

Open jsvd opened 8 years ago

jsvd commented 8 years ago

Here are some profilling I did on master, where a simple pipeline takes 9 seconds to start executing on my macbook:

% bin/logstash -e "input { generator { count => 1 } } output { stdout { codec => dots} }"
bash script Thu Jun 16 11:07:41 WEST 2016
started setting up bundler at 2016-06-16 11:07:43 +0100
require bundler took: 0.359
patching bundler took: 0.56
bundler reset took 0.017
bundler setup took 3.763
done setting bundler 2016-06-16 11:07:48 +0100
runner#run 2016-06-16 11:07:50 +0100
agent#execute 2016-06-16 11:07:50 +0100
Pipeline main started
.Pipeline main has been shutdown

This means that, of the 9 seconds it takes to execute the pipeline, we spend:

~2 seconds going from BASH to JRuby ~5 seconds executing LogStash::Bundler.setup!({:without => [:build, :development]}): https://github.com/elastic/logstash/blob/master/lib/bootstrap/environment.rb#L64 ~2 seconds processing cli args, creating the agent, compiling the pipeline and other setup tasks

jsvd commented 8 years ago

Also, if you're seeing startup times in the magnitude of minutes, you could be hitting https://github.com/jruby/jruby/wiki/Improving-startup-time#ensure-your-system-has-adequate-entropy

ipconfiger commented 7 years ago

I used on a 2 core 2G mem VPS, i will table 7min to start πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚πŸ˜‚

jsvd commented 7 years ago

@ipconfiger see my comment above, this seems to be a lack of entropy issue https://github.com/jruby/jruby/wiki/Improving-startup-time#ensure-your-system-has-adequate-entropy I use logstash often in amazon's free tier t2.micros and logstash definitely starts under 1 min

lajoie commented 7 years ago

I'm experiencing this as well: startup times are around 5-6 minutes. Here's what info I can provide.

logstash 5.1.1 w/o additional plugins w/ configuration provided below Java: OpenJDK 1.8.0_111 Host: CentOS 6.7, 1 vCPU, 512M ram, virtual machine

Following some of the recommendations in the previously mentioned wiki page I tried the following items:

For comparison purposes I downloaded logstash 2.4.1 and ran with the same config. Got the same results.

Then I ran the same config directly on my workstation and got a startup time of ~5s, which is more what I'm used to seeing.

Seeing the stark difference between the VM and my laptop I then tried logstash 5.1.1 on one of our AWS instances and saw a startup time of 10-15s.

So clearly there is some sort of environmental component to this. Not sure how to go further to figure out what it might be. But happy to run any tests from, or provide any data to, some one with a better understanding of logstash internals.

My configuration:

input {
  file {
    path => "/var/log/zookeeper/zookeeper.log"
    add_field => { "[@metadata][type]" => "zookeeper" }
    codec => multiline {
      pattern => "^\d\d\d\d"
      negate => true
      what => "previous"
    }
  }
}

filter {
  if [@metadata][type] == "zookeeper" {
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\s*%{WORD:level}\s*%{NOTSPACE:category}\s*%{GREEDYDATA:log_message}" }
    }
  }
}
ksiv commented 7 years ago

@jsvd thanks a lot ! Solved an issue like this by installing AND running haveged . Rise in entropia like 2r plus from 34-56 to 2000+ Increase in speed 5 min turned into 15 sec. http://passbe.com/2016/12/21/logstash-slow-start-up-times-and-exhausting-entropy.html

CowboyTim commented 7 years ago

That's IMO not a solution. There's no need to have so many entropy needed for an application to just start up,

untergeek commented 7 years ago

@CowboyTim, entropy exhaustion is a very real concern if you are doing anything with encryption. Logstash happens to spin up its encryption bits at initialization, and unfortunately this makes for a bad experience if you have insufficient entropy.

jsvd commented 7 years ago

I haven't tested yet logstash with jruby 1.7.27, which has introduced a backport of a fix around the SecureRandom code. I'm not sure if this can contribute to this problem.

[edit] wrong link, backport pr is https://github.com/jruby/jruby/pull/4149

CowboyTim commented 7 years ago

@untergeek it's indeed a real concern if there's encryption needed. But when it's not, the real concern is the time wasted trying to debug a problem which shouldn't be a problem - and the time wasted during application start for things it never needs. At the very least this can be just logged that the entropy is too low. This would even benefit applications that actually need entropy too.

untergeek commented 7 years ago

As @jsvd correctly points out, the real problem is upstream from Logstash in the JRuby code. We are migrating to JRuby 9000 soon, where the backported fix originated from.

fenchu commented 5 years ago

logstash 6.4.2: takes around 15sec to start on my i7workstation under wsl (which is considered a very slow ubuntu) but on our production servers in azure it takes:

PS C:\dist\logstash-6.4.2\testing> date; C:\dist\logstash-6.4.2\bin\logstash.bat -f .\zcat-proxylogs.grok
Fri Oct 12 12:36:28 CUT 2018
Sending Logstash logs to C:/dist/logstash-6.4.2/logs which is now configured via log4j2.properties
[2018-10-12T12:37:57,041][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-10-12T12:38:03,239][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.4.2"}
[2018-10-12T12:38:39,886][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-10-12T12:38:44,767][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x350d08f4 run>"}
[2018-10-12T12:38:45,421][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2018-10-12T12:38:51,575][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

~90 sec, is there a workaround for this, because it is scheduled to parse a lot of gz files, and powershell ends up reading in 15GB in the pipeline before logstash/jruby is ready.

fenchu commented 5 years ago

Also see the very slow stage from jruby ready: 37:57 to logstash started: 38:51 (56Sec), when it runs it is fast parsing around 5000+ lines a sec. Can I have this resident? it is actually faster to copy files across the net (around 20GB/day) and run logstash locally on the ELK-box.

guylando commented 5 years ago

Also seeing this slowness on windows, with a simple config it takes 90-120 seconds to start

jsvd commented 5 years ago

@guylando can you run with --log.level=debug and see the time to the first logging message? Which version of logstash are you running? and what are the specs on the windows machine?

guylando commented 5 years ago

@jsvd logstash version 6.4.2 windows server 2012, takes 90 seconds. I ran process monitor (sysinternals) and it seems that all this time it just loads more and more gems and java jar libraries.

guylando commented 5 years ago

I see one problem causing slowness: When it loads ruby gems, it looks for them in all subfolders of \vendor\bundle\jruby\2.3.0\gems\ and finds them only after more than 20 failed lookups. This happens for many gems, for example: clamp.rb, connection.rb, collection.rb, faraday.rb, mvar.rb, definition.rb

guylando commented 5 years ago

@jsvd Its suggested here to try jruby with --dev flag: https://github.com/jruby/jruby/wiki/Improving-startup-time#use-the---dev-flag Do you know where can I add a flag to the call to jruby in logstash?

jsvd commented 5 years ago

does the majority of the 90 seconds happen before the first log entry? The use of --dev will slow down logstash processing, and it's not clear that it will help in this situation. Did you start seeing this after an upgrade or was it always an issue?

guylando commented 5 years ago

90 seconds before first log. seeing this when installing a fresh system, not aware of other situations.

guylando commented 5 years ago

changing logstash memory from 3gb to 4gb didnt help.

guylando commented 5 years ago

@jsvd I want to add --dev flag to just test if it helps to find where the problem is. How do I add it?

guylando commented 5 years ago

logstash uses jruby version 9.1.13 which is more than 1 years old however seems there were major performance improvemenets for jruby on windows in newer versions, see last comments in: https://github.com/jruby/jruby/issues/5060

UPDATE: Tested the 9.1.17 version (had to remove the resolv.rb patch) and it didnt improve the startup time. Still 90 seconds unfortunately.

fozboz commented 5 years ago

I've noticed this since at least Logstash 6.3.0. It will always take around 40-60 seconds to start.

With version 6.4.2 on Ubuntu 16.04 running with --debug, the delay occurs before any logs appear. As soon as the first line appears, it starts very quickly and then performs fine.

fozboz commented 5 years ago

@guylando you can recreate --dev flag with these lines in your jvm.options file.

-Djruby.compile.invokedynamic=false
-Djruby.compile.mode=OFF
-XX:+TieredCompilation
-XX:TieredStopAtLevel=1

This improved start times for me considerably, but this is not recommended for long running threads as it will not optimise code before starting.

I am configuring Logstash with Chef and want to test pipelines before they are published using --config.test_and_exit, and having > 30 second start times testing 20 pipelines is not viable. I can at least now change JVM settings just for these tests and get the start time down to ~ 10 seconds.

guylando commented 5 years ago

@fozboz thanks in my case too, it works fine after the first line appears and from investigation with sysinternals procmon I saw that before the first line appears what happens is multiple libraries loading with many libraries being searched in wrong location slowing down the whole thing

celesteking commented 5 years ago

30 secs to wait for config syntax check is way too long, you need either to optimize it out, or think out a separate config parser and checker. Look at bison for that.

untergeek commented 5 years ago

That's a dependency that's not likely to be added. It's a JVM-based install, and everything that Logstash needs must be available within the JVM to make it available on any platform. Sure, bison would be great, if it were running on Linux, but Logstash has to run on Windows, Linux, MacOS, etc.

Future releases may be more tuned to have a JVM start, and then remain up, and then have pipelines added afterwardβ€”kind of how centralized pipeline management works now, but in a broader sense.

Or not. It depends on how the development team chooses to tackle it.

lubobill1990 commented 5 years ago

+1 it's hard to debug the pipelines when logstash is slow to start.

fenchu commented 5 years ago

logstash developers should consider a resident mode, so we can start up logstash and let it parse via sockets. We have stopped using logstash, we now use elastics _bulk api directly, with parsing and preparing in python, we manage 10K rows as sec. Even before logstash is operational we have inserted almost 900K rows.

M0rdecay commented 4 years ago

+1 to this. A long Logstash launch creates difficulties in debugging the pipeline and negatively affects production if the ELK servers are periodically restarted.

Logstash version - 7.7.1 in container.