Performance Testing - Githubissues

benjchristensen commented 10 years ago

I would like to integrate performance testing as a first-class aspect of rxjava-core in https://github.com/Netflix/RxJava/tree/master/rxjava-core/src/perf

One option is Google Caliper: https://code.google.com/p/caliper/ Another is JMH: http://openjdk.java.net/projects/code-tools/jmh/

Of potential interest, Netty uses JMH: http://netty.io/wiki/microbench-module.html

I have placed some very simple, manual performance tests in the /src/perf folders for now but I'd like to establish the tooling and a few solid examples so we have a pattern to follow.

benjchristensen commented 10 years ago

/cc @abersnaze as you've been involved in these discussions and you're researching Google Caliper.

gvsmirnov commented 10 years ago

I would very much recommend using JMH, and not Caliper. The latter has lots and lots of issues, which are addressed in the former. Here's a great presentation about it.

benjchristensen commented 10 years ago

Thank you for weighing in and sharing that presentation, just read through it, very interesting. Can you point to anything about the issues with Caliper?

headinthebox commented 10 years ago

@gvsmirnov JMH looks technically pretty impressive, but seems not to integrate as nicely as Caliper in an IDE workflow. I could only find some very brief comments about IntelliJ integration on the Web, do you know more. Also, as @benjchristensen says, the presentation is super interesting but does not answer the question why Caliper is not a good choice.

A side question about all his benchmarking stuff is how much it relates to performance in production. i.e. when running the benchmarks, you measure things in a very specific way, but in production it runs in a completely different environment. It sometimes feels to me like measuring calories using a http://en.wikipedia.org/wiki/Calorimeter, which does not really correspond to the actual digestion of food. To try to state it more formally, is benchmarking monotonic, in other words does Benchmark(A) < Benchmark(B) imply that InProduction(A) < InProduction(B)?

gvsmirnov commented 10 years ago

@benjchristensen Unfortunately, there is no article/presentation/whatever which explicitly points out all the pitfalls of Caliper that I know of. But for most of the common problems (outlined in the presentation), Caliper has no built-in means to work around (the last time I checked, at least). The most broken thing about Caliper is that it falls victim to loop unrolling. See here.

JMH is all about taking the trouble off our shoulders, especially the trouble we do not even suspect exists. Many things that are hard to implement in Caliper (like this and that and that) are easy to do in JMH.

@headinthebox Now, regarding IDE support, there is indeed next to no of it. But I personally hardly ever use IDE for things like running tests or working with VCSs. Command-line utilities work fine for me. And for JMH, they are much better that your average CLI tool.

gvsmirnov commented 10 years ago

I have just started a mechanical-sympathy thread that discusses this subject. There will probably be a lot of info there in a couple of days.

benjchristensen commented 10 years ago

Thank you @gvsmirnov for the information. This is something I hope we'll make a first-class aspect of RxJava in the near future and your information will really help.

Are you interested in helping us bootstrap RxJava with JMH? The rxjava-core/src/perf/ code is wide-open right now to setup correctly.

gvsmirnov commented 10 years ago

@benjchristensen I most definitely am. There are some spare time issues at the moment, though, so I don't think I will be able to contribute for a couple of weeks. Afterwards, I would be happy to.

benjchristensen commented 10 years ago

I understand that problem! Once you have some time I'd appreciate your help to get us started down the right path.

abersnaze commented 10 years ago

Some observations on the difference now that I've actually used both of them:

Caliper PROS

It also measures object count+memory usage as well as time.
Makes it clear that is monitoring JIT and GC events during the timing.
parameter annotations makes easier to test different configurations without having to generate a method for each combination manually.

CONS

Warm up is a bit a black box. I've seen the warnings that it has detect JIT during measurement often enough that it makes me think that it isn't doing enough to warm up the code.
It uploads the results!

P.S. I'm not an expert in either benchmarking tool.

gvsmirnov commented 10 years ago

@benjchristensen Sorry it took me so long, but I'm finally back. I've thrown together a sample gradle project with JMH support here. Hoping to integrate it with RxJava real soon.

gvsmirnov commented 10 years ago

Oh, finally! I have sent a pull request (https://github.com/Netflix/RxJava/pull/963) with the updated JMH benchmarking. It features changes both to the gradle setup, and to the benchmark itself.

The gradle set up us explained in this blog post.

The benchmark is changed in such a way that prevents most of the caveats (like DCE) from happening, while also ensuring that more accurate results are attained. Please consult these samples to gain deeper insight into how benchmarking should be done with JMH.

Here are the results that I got on my Haswell 2.6 GHz 16 GB RAM laptop with Java 8:

Benchmark                                  (size)   Mode   Samples         Mean   Mean error    Units
r.o.ObservableBenchmark.measureBaseline         1   avgt        10        0.003        0.000    us/op
r.o.ObservableBenchmark.measureBaseline      1024   avgt        10        2.764        0.051    us/op
r.o.ObservableBenchmark.measureBaseline   1048576   avgt        10     3104.088       49.586    us/op
r.o.ObservableBenchmark.measureMap              1   avgt        10        0.100        0.003    us/op
r.o.ObservableBenchmark.measureMap           1024   avgt        10        5.036        0.059    us/op
r.o.ObservableBenchmark.measureMap        1048576   avgt        10     6693.271      277.604    us/op

What we see here is that doing nothing RxJava introduces about a 2x overhead in latency compared to simply doing nothing. Pretty acceptable if you ask me.

benjchristensen commented 10 years ago

This is great @gvsmirnov Thank you!

Is there a way to maintain historical snapshots over time for getting performance diffs?

gvsmirnov commented 10 years ago

@benjchristensen you're very welcome.

Uh. I'm not exactly sure if there is an established practice with that. You can easily get JMH to output its results in csv, scsv or json. Should not be a long way from there.

What I'm doing is: before merging anything to master, run the benchmarks on master and on the branch. Works fine for me.

benjchristensen commented 10 years ago

We have JMH integrated and being used so closing this. Thank you @gvsmirnov for your help on this!

ReactiveX / RxJava

Performance Testing #776