TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.05k stars 219 forks source link

Update benchmarks on wiki #1006

Closed yebai closed 2 years ago

yebai commented 4 years ago

The benchmark numbers on the wiki are seriously out-of-date, and probably misleading about Turing's performance. Better to update the numbers using the current releases.

https://github.com/TuringLang/Turing.jl/wiki

yebai commented 4 years ago

@xukai92 are these models available somewhere? Perhaps we can add them to https://github.com/TuringLang/Turing.jl/tree/master/benchmarks

xukai92 commented 4 years ago

Seems that they are avaiable in an old branch here https://github.com/TuringLang/TuringExamples/tree/old-models/old-models.

For the benchmark suite, can we add the Stan version as well?

yebai commented 4 years ago

For the benchmark suite, can we add the Stan version as well?

I think so, Github actions are quite generous with build time compared to Travis. So we can run these benchmarks altogether, then produce a table on the fly.

xukai92 commented 4 years ago

Sounds good. I will take a look after finishing the remaining issues for AABI in AHMC.

xukai92 commented 4 years ago

Will be fixed via https://github.com/TuringLang/TuringExamples/pull/22

xukai92 commented 4 years ago

Here is a new table we can use

Model Stan Turing
Gaussian with Unknown Parameters 0.342 +/- 0.015 2.211 +/- 0.061
Hierarchical Poisson 0.134 +/- 0.068 0.325 +/- 0.013
High Dimensional Gaussian 11.609 +/- 0.306 9.766 +/- 0.222
Semi-supervised HMM 5.033 +/- 0.058 463.213 +/- 26.045
LDA 43.888 +/- 0.504 378.762 +/- 7.91
Logistic Regression 56.15 +/- 2.274 3.942 +/- 1.331
Naive Bayes 13.677 +/- 0.142 6.848 +/- 0.144
Stochastic Volatility 0.918 +/- 0.014 75.026 +/- 30.579

What's the best place to host it? Not sure if we still want it on the wiki page.

yebai commented 4 years ago

Let's make a nice table, and put it on the front page of turing.ml, with a link the script to reproduce all the numbers.

xukai92 commented 4 years ago

Slightly improved the table. Another other change to make?

yebai commented 4 years ago

cc @trappmartin and @cpfiffer, who might have ideas/suggestions regarding how to format and publish this benchmarking result on the front page.

yebai commented 4 years ago

here is an example for Julia's benchmarking page: https://julialang.org/benchmarks/

xukai92 commented 4 years ago

Thanks for the pointer. I can make the visualiation. I will also improve the table a bit more - got an idea.

xukai92 commented 4 years ago

Its a bit hard to make the markdown table nice as white spaces would be ignored. Plain text actually looks nice.

PPL                              Turing             Stan
Model
10,000D Gaussian         9.766 ±  0.222  11.609 ±  0.306
Gaussian Unknown         2.211 ±  0.061   0.342 ±  0.015
Hierarchical Poisson     0.325 ±  0.013   0.134 ±  0.068
LDA                    378.762 ±  7.910  43.888 ±  0.504
Logistic Regression      3.942 ±  1.331   56.15 ±  2.274
Naive Bayes              6.848 ±  0.144  13.677 ±  0.142
Semi-Supervised HMM    463.213 ± 26.045   5.033 ±  0.058
Stochastic Volatility   75.026 ± 30.579   0.918 ±  0.014

UPDATES

cpfiffer commented 4 years ago

Turing should probably be the first column, and we should order them by which models Turing performs better in.

xukai92 commented 4 years ago

Also made a plot

results

Y-axis is in log scale

devmotion commented 4 years ago

Maybe you could add error bars of the standard deviation to the plot?

trappmartin commented 4 years ago

Would it be possible to add some benchmarks on which we evaluate how Stan and Turing performs with increasing number of observations? Basically a line plot with the number of observations on the x-axis.

xukai92 commented 4 years ago

Maybe you could add error bars of the standard deviation to the plot?

Sure

Would it be possible to add some benchmarks on which we evaluate how Stan and Turing performs with increasing number of observations? Basically a line plot with the number of observations on the x-axis.

Sure. But let improve those we are slow first. Otherwise it's hard to benchmark them (inference time is too long).

xukai92 commented 4 years ago

I've copied and pasted the current table and figure to the wiki.