Closed purbon closed 8 years ago
Can you add the lumberjack
plugin to the list of the common inputs?
@purbon I like this approach! With logstash 2.0 coming up, are there any plans to build some basic benchmarking counters / timers into logstash itself to aid in this effort? I can imagine that some basic throughput numbers could be useful over a future Logstash API
a few thoughts:
@colinsurprenant all good points on the JIT.
It may also be good to lower -XX:CompileThreshold
to force JITing to happen a little sooner given that Logstash is usually a long-lived process.
@andrewvc I think that changed to jruby.jit.threshold
see https://github.com/jruby/jruby/wiki/PerformanceTuning#jit-runtime-properties in any case, since we normally operate at high TPS, the 50 default is normally hit right at the start and does not make much difference.
In any case, having a proper benchmarking system in place will provide a good environment to play with all these options, including the JVM GC options and see what performs better.
@purbon nice write up!
Number of events can vary a lot too, here we'll build versus 1_000 events, 10_000 events, 100_000 events and 1_000_000 events.
Like @colinsurprenant mentioned, we should not try to control the flow of events to these ranges. However I do think we should have a fixed, large data set -- say 10 million apache log lines and have this ingested into LS using common configurations to catch regressions. The danger in performance testing is having too many variables and trying to tune them all at once. This will become too ambitious. We need to restrict the number of variables at play here. Most of our performance issues are regressions because we don't have historic data captured and we don't know baseline numbers. I would suggest we focus on this problem first. We are not trying to publish official benchmarks for any of the plugins or config combination (yet).
I think the primary goal for this exercise should be to build an infrastructure that allows us to easily record performance (throughput) for a static set(s) of configuration using static set(s) of input data nightly. We can then start playing with the knobs. Of course, if we have 10m log lines it will take a while for LS to process it, so this will account for JVM warm up, GC cycles etc.
As a next step, if we can extend this perf suite to automatically run per commit, it will be great.
@ph, lumberjack
added. Makes sense! Do you have any other in mind?
@andrewvc This is actually one of the ideas in the long run, when we provide API for logstash, giving numbers as:
will be really beneficial and profitable for everyone. If I don't recall wrong, there are some issues open for this under the roadmap
tag.
@colinsurprenant First of all thanks for your thoughts, let me go throw them and try to explain myself a bit.
Should say this initial issue was not intended to go in a lot of details, the target here is to be a meta issue planting raw ideas, but without going into details. Obviously I'm having warm up as a very important face of this benchmark, as it should be for any kind of benchmarking. Let me explain a bit here my approach on how to deal with warm up for this benchmark, for a more detailed discussion I would say we move this to the specific issue I'm going to open for the benchmarking framework.
The approach used to deal with warm up our first benchmarking efforts, initiated by you, deal with this first phase by integrating it into the execution phase. In a nutshell the user sets a long enough time and the finally numbers (aka top TPS, avg TPS, etc..) are going to be more correct as longer the execution goes. My intention is to make warm up a first class citizen of this benchmark, by providing an interface to the user that explicitly lets him decide how to execute it. I like this approach as it makes this very important face explicit. No need to say the former approach is also valid, but I my idea right now is to do it like this.
Speaking about the number of events, here we also have two different valid approaches, we can make the base variable the time, or we can see how Logstash performs by injesting a given number of events. As you said, by using time as the base variable, we aim to see the maximum throughput, still bound the the timeframe used, but as longer as it runs, more close to the maximum would be expected. However, and with the intention to provide numbers to the end users, using the numbers of events as a variable lets the user see how LS perform for their expected numbers of events. Eventually you will reach the maximum by augmenting the number of events. Again both approaches are ok and provide meaningful points of view for different actors.
I don't close the door to include the time approach as to the benchmarking framework is a valid one that together with this one will provide a lot of information to the users of this.
I hope I explained myself properly, let me know if you have more concerns about the meta idea, and lets move detailed discussion to their own issues. Your contributions are always interesting!!
/cheers
@suyograo see https://github.com/elastic/logstash/issues/3499#issuecomment-115163476 for more thoughts on why I put this numbers to run the benchmarks, let me know If it needs furder explanations.
I would say both approaches are completely ok and providing meaningful information for different stakeholders, I will not discard providing numbers based on number of events so easy without a bit more discussion the benefit this can provide.
See http://ldbcouncil.org/benchmarks/snb as a good example on running bechmarking, I do really like the way they approached. No need to explain they are benchmarking a database and us a pipeline, what for them are queries, for us is the configurations/plugins used that play a really important role for LS performance.
@suyog, the idea is to have the load syntactic generator parametrized, so it can generate the number you want. However to publish data I would keep to this numbers of events for now. As said before, like this people will related, oh! I have LS and i aim to ingest 1_000_000 events, lets see how long it will take. Makes sense?
On Thu, Jun 25, 2015 at 7:44 AM Suyog Rao notifications@github.com wrote:
@purbon https://github.com/purbon nice write up!
Number of events can vary a lot too, here we'll build versus 1_000 events, 10_000 events, 100_000 events and 1_000_000 events.
Like @colinsurprenant https://github.com/colinsurprenant mentioned, we should not try to control the flow of events to these ranges. However I do think we should have a fixed, large data set -- say 10 million apache log lines and have this ingested into LS using common configurations to catch regressions. The danger in performance testing is having too many variables and trying to tune them all at once. This will become too ambitious. We need to restrict the number of variables at play here. Most of our performance issues are regression because we don't have historic data captured. I would suggest we focus on this problem first. We are not trying to publish official benchmarks for any of the plugins or config combination (yet). I think the primary goal for this exercise should be to build an infrastructure that allows us to easily record performance (throughput) for a static set of configuration using a static set of input data. We can then start playing with the knobs. Of course, if we have 10m log lines it will take a while for LS to process it, so this will account for JVM warm up, GC cycles etc. As a next step, if we can extend this perf suite to run per commit, it will be great.
— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/3499#issuecomment-115113466.
Hi, let me go throw your comments:
On Thu, Jun 25, 2015 at 3:45 PM Colin Surprenant notifications@github.com wrote:
- I still believe that thinking in terms of number of events is somewhat futile. "nobody" who deals with streaming data think in fixed number of events, but in TPS.
TPS will be one of the variables reported after the test execution.
- to be relevant, these fixed set of events would have to be large enough, typically in the millions so that the run time would be long enough to have any meaning.
With the option to run N number of events, you actually let the benchmark user to stress LS as hard as they want. This feature goes also attached with the capacity of matching real live logs, I mean things like messages of different forms, having poisoned messages here and there, and much more. This is something critical to any benchmark.
- you really don't need a 10M lines log file to do a benchmark, a 100 or 1000 lines file that you repeat continuously is just fine and way easier to carry around, it can also be part of the project. Obviously that also depends if the input plugin that need to be benchmarked in which case the input format will depend on the plugin.
I disagree, replying 10 lines over and over again, is also not the best approach here.
- for the warmup, I am not sure I understand the concept of first class citizen and parametrized warmup time - in any case, the warmup phase cannot be dissociated from the execution phase, it has to be the same continuous process and you have to figure out that in that total run time period, which part is the warmup and which part is the benchmarkable run time, then you decide how you want to play with the numbers to account for the slower part of the warmup time.
What I'm talking here is about having a command like like:
benchmark --warm [description] -process logfile
like this you know how the system behave with different warmup. Obviously you can not run the warm up, stop LS, and then process files, this makes not sense.
— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/3499#issuecomment-115262641.
I disagree, replying 10 lines over and over again, is also not the best approach here.
- I didn't say 10, but depending on the benchmark you do, a log sample set size can actually be totally valid at 10, 100, 1000, 10000 lines. For example, I would be very surprised if a TPS benchmark shows any significant difference between a 10M syslogs or apache logs file versus a 1k sample that you replay. Anyway, my point here is that having smaller sets which are statistically valid sample sets, is a lot more easier to manage, and you can actually include them in the project.
All-in-all great initiative! The scope is very large so let's make sure we iterate and target the most useful metrics and start collecting these to see trends and immediate performance improvements/regressions.
Also, let's see what can be reused from https://github.com/elastic/logstash-integration-testing
About the "synthetic log generator" idea, I know @rashidkpc had a neat one which could actually generate "realistic" or "interesting" log "shapes" for Kibana. Could be useful?
hey,fellows. my logstash agent(client) can only read about 350 records(about 85M) per second which is not even close to what we need. any good ideas about how to optimize?
Hi, so you mind opening another issue, to hold this discussion ? there you can share your configuration so we can see if there is any special thing that might made your ingestion rate slow.
On Wed, Jul 1, 2015 at 11:13 AM VicentJ notifications@github.com wrote:
hey,fellows. my logstash agent(client) can only read about 350 records(about 85M) per second which is not even close to what we need. any good ideas about how to optimize?
— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/3499#issuecomment-117556217.
@purbon hi,no,never mind.thanks for sharing my concern.new issue website: https://github.com/elastic/logstash/issues/3549
Preface
This issue is intended to wrap up the benchmarking and profiling efforts around the Logstash project and it's ecosystem. This is actually a very important, and relevant topic, that will serve as a common ground for engineers, decision makers, developers and people interested to know what can they archive with Logstash.
The internet is nowadays full of benchmarking done with the technical people not in mind, unlike them in this efforts we will have special consideration to be open, by providing the necessary tooling and data available, so you're actually able to perform the same analysis as we do and archive your own conclusions. Your feedback is going to be very valuable.
Objective
There are a set of objective to be archived with this efforts, I'm going to summarize them here (for more details please check related issues when available) :
Methodology
To build this benchmarks we aim to build different use case combinations with the intention to match real life usage. The initial bullet points are:
0.5 Kb, 1 Kb, 2 Kb, 5 Kb and 10 Kb
.But we should not forget data formats, this vary a lot and can also be a source of performance analysis. From our experience most common formats are:
And last but not least we should consider in which kind of configurations people is using logstash, for this we aim to test in several different ones (including machines, virtualization, JVM and JRuby versions, etc).
Tooling
To run this analysis we will need to build/integrate some tooling, here is the list of the ones we'll need for:
Annex 1 (Profiling/Benchmarking grok)
Is well known that Grok can be slow, if not used properly, we aim to provide also an annex benchmark on this filter to help people see how fast do they grok expression go.
Annex 2 ( Most common plugins )
As seen the most popular plugins are:
Annex 3 (Most common used HA setups)
We're going to keep updating this as the benchmarking initiative gets going on, including related issues, numbers archived, etc. Contributions are more than welcome, feel free to report here your feedback, ideas, etc.
Happy benchmarking!!
Related issues: #3477