johnnychen94 commented 2 years ago

Since we have made Julia 1.6 the new LTS version, it might make sense to update the benchmark results in https://julialang.org/benchmarks/.

StefanKarpinski commented 2 years ago

We just need someone willing and able to get all the programs running on a single benchmark system.

MilesCranmer commented 2 years ago

Could this instead be set up using GitHub actions for continuous benchmarking?

See https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/ for a discussion. It might be a good option, as the performances on a GitHub action VM are pretty consistent:

If you run all benchmarks with a single action, then you would guarantee the same VM each time, and could measure the performance ratios.

+you could also see the performance comparison across a variety of versions of each language.

(This would exclude proprietary languages such as mathematica)

StefanKarpinski commented 2 years ago

A bunch of the benchmarked programs are proprietary such ass Matlab and Mathematica.

MilesCranmer commented 2 years ago

It's only those two right? I think it's very reasonable to exclude proprietary software in a reproducible benchmark. e.g., https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html excludes any proprietary languages. Yes you would lose a couple of data points, but you would have an always up-to-date benchmark, which I see as significantly more important.

In my opinion, the most important comparisons are against C, Rust, and Fortran (+ maybe numpy), since users of those packages are the ones who would look up speed comparisons - not so much Mathematica users. As long as those are included, we are good.

As an alternative option, it seems there are some free versions provided by MATLAB and Mathematica which are available for GitHub actions:

timholy commented 2 years ago

the most important comparisons are against C, Rust, and Fortran

I disagree. First, it's going to be boring: any language that doesn't get in your way will be pretty fast. Second, the point of Julia is to be good at two things: easier to use than C and faster to run than Python. That's a point the benchmarks have to make, which between their source code and raw numbers, they do.

MilesCranmer commented 2 years ago

easier to use than C and faster to run than Python.

I think that's what benchmarks against fast languages show, no?

Show syntax examples to demonstrate its just as easy to use as Python
Show benchmarks against C/Rust to show its fast

Anyways, this is a second order effect. The more important point is updating out-of-date benchmarks. Basically I am saying that I don't think mathematica/matlab should be roadblocks to getting updated results against C/Rust/Fortran/Python.

MilesCranmer commented 2 years ago

See #51 which drafts a GitHub workflow for running the suite

johnfgibson commented 2 years ago

My feeling is that MileCranmer is right, that having up-to-date benchmarks against just open languages is better than having them held up, going back multiple Julia versions, in order to have a couple proprietary languages.

I really enjoyed doing the benchmarks back for Julia-1.0, but I haven't able to keep it up, due to the investment of time to update each language environment (many with their own peculiar set-up and build system), and also COVID, which has kept me working at home without access to campus-locked proprietary licenses. I was hoping to return this past fall, but delta and omicron have kept me from that.

So I'm supportive of your effort to do this via GitHub workflow.

MilesCranmer commented 2 years ago

Thanks @johnfgibson.

So, with the loss of my sanity, I finally got the workflow running in #51 - it correctly generates the various csv files. This is in spite of most languages being easy to set up since there are already GitHub actions available which stack up on the same VM. I therefore greatly empathize with @johnfgibson in knowing that you had to set these up manually each time...

The workflow runs for the following languages:

C, fortran, java, javascript, julia, python, R, and rust.

The following benchmarks are not part of the current workflow, for the reasons given below:

Lua, Go (install fine, but the benchmarks are out-of-date with current syntax)
Octave (couldn't find a github action that supports version control)
Mathematica, MATLAB (proprietary)

I think these excluded languages are lower priority, so I would vote for simply displaying the up-to-date benchmarks with the other languages. Then if/when the broken ones are fixed we can turn them back on. Thoughts?

MilesCranmer commented 2 years ago

Here are the actual updated benchmarks, copied from the workflow's output. Could these automatically update the website after #51 is merged?

Seems like parse_integers got a massive improvement compared to the currently displayed results, which is awesome. matrix_multiply also seems like it has put Julia clear in the lead now:

c,iteration_pi_sum,8.028984
c,matrix_multiply,43.012142
c,matrix_statistics,5.007982
c,parse_integers,0.19634
c,print_to_file,20.508051
c,recursion_fibonacci,0.025188
c,recursion_quicksort,0.422955
c,userfunc_mandelbrot,0.08167
fortran,iteration_pi_sum,8.028663
fortran,matrix_multiply,57.163952
fortran,matrix_statistics,8.046266
fortran,parse_integers,0.753935
fortran,print_to_file,113.916496
fortran,recursion_fibonacci,4.4e-5
fortran,recursion_quicksort,0.483927
fortran,userfunc_mandelbrot,7.8e-5
java,iteration_pi_sum,16.370829
java,iteration_sinc_sum,0.049201
java,matrix_multiply,788.083768
java,matrix_statistics,30.276736
java,parse_integers,0.274402
java,print_to_file,99.797282
java,recursion_fibonacci,0.0424
java,recursion_quicksort,1.006608
java,userfunc_mandelbrot,0.136501
javascript,iteration_pi_sum,10.5
javascript,matrix_multiply,2900.0
javascript,matrix_statistics,46.9
javascript,parse_integers,0.64
javascript,print_to_file,118.0
javascript,recursion_fibonacci,0.109
javascript,recursion_quicksort,1.61
javascript,userfunc_mandelbrot,0.149
julia,iteration_pi_sum,8.028063
julia,matrix_multiply,33.387676
julia,matrix_statistics,8.219065
julia,parse_integers,0.137201
julia,print_to_file,18.368588
julia,recursion_fibonacci,0.0482
julia,recursion_quicksort,0.469904
julia,userfunc_mandelbrot,0.0796
python,iteration_pi_sum,630.6591033935547
python,matrix_multiply,49.559593200683594
python,matrix_statistics,51.499128341674805
python,parse_integers,1.6732215881347656
python,print_to_file,54.22806739807129
python,recursion_fibonacci,2.522706985473633
python,recursion_quicksort,11.09170913696289
python,userfunc_mandelbrot,6.908893585205078
r,iteration_pi_sum,236.0
r,matrix_multiply,116.0
r,matrix_statistics,78.0
r,parse_integers,4.0
r,print_to_file,1325.0
r,recursion_fibonacci,10.0
r,recursion_quicksort,22.0
r,userfunc_mandelbrot,20.0
rust,iteration_pi_sum,8.029562
rust,matrix_multiply,46.196557
rust,matrix_statistics,6.52925
rust,parse_integers,0.186271
rust,print_to_file,11.194186
rust,recursion_fibonacci,0.046293
rust,recursion_quicksort,0.428904
rust,userfunc_mandelbrot,0.080522

MilesCranmer commented 2 years ago

51 is merged now 🎉

How do we update the benchmarks webpage?

acxz commented 2 years ago

@MilesCranmer thanks for the amazing work on the ci!

Lua, Go (install fine, but the benchmarks are out-of-date with current syntax)

I have taken @sbinet's #27 and added a commit to enable the go benchmark in #55

I'll try to work on getting Lua working if I get some time later. Edit: got Lua working as well.

acxz commented 2 years ago

Also do you know of a way to get the system hardware specifications from the CI machine? While it doesn't necessarily matter to get the comparison, know the actual hardware might help interpret the numbers between CI runs.

MilesCranmer commented 2 years ago

Nice work!

I don't know how to get the hardware specs for a particular workflow. According to the docs, linux runs always use 2-core CPU, 7 GB of RAM memory, 14 GB of SSD disk space, (in a virtual machine) but they don't specify whether the CPU changes.

According to the article here - https://labs.quansight.org/blog/2021/08/github-actions-benchmarks/, the times are noisy, so times should only be interpreted relative to C, rather than absolute times.

acxz commented 2 years ago

I see, no worries. Thanks for digging that info up.

acxz commented 2 years ago

How do we update the benchmarks webpage?

The code for the benchmark webpage is located here: https://github.com/JuliaLang/www.julialang.org/blob/main/benchmarks.md

The code used to create the graph is located here and other assets is located here: https://github.com/JuliaLang/www.julialang.org/tree/main/_assets/benchmarks

acxz commented 2 years ago

I went ahead and updated the plotting code used to work with newer package versions and julia v1.7.2 See https://github.com/JuliaLang/www.julialang.org/pull/1648

And using the benchmark data output from the CI plotted the following graph: benchmarks

A couple notes: The data used was from the following CI run #57 : https://github.com/JuliaLang/Microbenchmarks/runs/5531819551?check_suite_focus=true

For some Fortran benchmarks (see: #58) the values were interpolated based on the ratio of the old Fortran/C benchmark and that ratio multiplied with the newer C time in the CI benchmarks.csv file. The old ratio is computed based on the data (located here) used to create the current plot on the benchmarks webpage

Similarly the Matlab/Mathematica values were interpolated based on their ratios from the same older benchmarks data. I decided to exclude Octave since that would just expand the chart and make it a bit harder to read.

Since Go is not included in the CI along with Lua at the moment, Go values were interpolated based on the CI run at #55

Edit: Here is the actual interpolated CSV file I used to plot with: interp_benchmarks.csv

acxz commented 2 years ago

Here is a code to plot the benchmarks with PlotlyJS instead of Gadfly. It allows for interactivity such as automatic sorting of languages based on selected benchmarks.

# Producing the Julia Microbenchmarks plot

using CSV
using DataFrames
using PlotlyJS
using StatsBase

benchmarks =
    CSV.read("interp_benchmarks.csv", DataFrame; header = ["language", "benchmark", "time"])

# Capitalize and decorate language names from datafile
dict = Dict(
    "c" => "C",
    "fortran" => "Fortran",
    "go" => "Go",
    "java" => "Java",
    "javascript" => "JavaScript",
    "julia" => "Julia",
    "lua" => "LuaJIT",
    "mathematica" => "Mathematica",
    "matlab" => "Matlab",
    "octave" => "Octave",
    "python" => "Python",
    "r" => "R",
    "rust" => "Rust",
);
benchmarks[!, :language] = [dict[lang] for lang in benchmarks[!, :language]]

# Normalize benchmark times by C times
ctime = benchmarks[benchmarks[!, :language] .== "C", :]
benchmarks = innerjoin(benchmarks, ctime, on = :benchmark, makeunique = true)
select!(benchmarks, Not(:language_1))
rename!(benchmarks, :time_1 => :ctime)
benchmarks[!, :normtime] = benchmarks[!, :time] ./ benchmarks[!, :ctime];

plot(
    benchmarks,
    x = :language,
    y = :normtime,
    color = :benchmark,
    mode = "markers",
    Layout(
        xaxis_type = "categorical",
        xaxis_categoryorder = "mean ascending",
        yaxis_type = "log",
        xaxis_title = "",
        yaxis_title = "",
    ),
)

plotly doesn't have support for sorting by geometric mean: See https://plotly.com/julia/reference/layout/xaxis/#layout-xaxis-categoryarray and the feature request. This makes it a bit rough for log scales, as the sorting is based on arithmetic mean.

I've been thinking about it and I believe that the plotting code should probably reside in this repo instead of the julia website codebase. Only the final benchmark svg file should be pushed to the website repo.

In terms of the website benchmark page tho, it might be pretty cool to have an embedded plotly instance for the benchmark graph, similar to what the plotly docs do. This would allow users to see/sort languages based on what benchmark they are most interested in. Some extra nonessential interactivity.

Just throwing some ideas.

acxz commented 2 years ago

From https://github.com/JuliaLang/Microbenchmarks/pull/62#issuecomment-1098006247

Would it be too crazy to just pull the performance timings right out of the Github Actions?

So basically like on every commit, get the benchmarks.csv output from the CI and commit it to the repo? That should be doable, however, I'm not completely sold on the idea of having a update timing commit for every other commit tbh. I think manually downloading the benchmarks.csv file from the latest commit, whenever we need to update the timings/graph/table is prob the best method for now.

Maybe that is the easiest way to actually run the benchmarks.

For sure, Github Actions has been a boon.

The big issue would be that we can't get numbers for commercial software.

Yeah... How I'm currently handling this (to get the graph as shown here) is to interpolate the actual timings based on the ratios on the last known timing data for those languages which we don't have timings for. I'm not sure if publishing that kind of interpolated data on the JuliaLang website is honest (even with appropriate disclaimers), but I do think that our graph should contain data for those languages as no other benchmarks do. (I'm personally okay with this myself though, interpolated data is better than no data) There are options for CI as discussed here and if it comes down to it, I am still a student and have licenses for these commercial languages. I can try to run the tests myself on local hardware once I fix up tooling PRs such as this one.

We can just update the plot on the Julia website as well. It is really old.

While it is old, the information from the new graph is very similar to the previous graph. Rust and Julia both overtake Lua, but that's the only significant (trend) changes besides overall improvements in individual benchmarks. Let's try to 1) use interpolated data or 2) get commercial software working (CI/locally).

I'm totally fine making PRs to the julialang website with option 1) as a stopgap till we get updated data with 2).

MilesCranmer commented 2 years ago

I think it's fine if we do a single manual update to the csv/svg on the website, before automating benchmark updates (which might take a while longer to set up).

Interpolate the actual timings based on the ratios on the last known timing data for those languages which we don't have timings for.

For now, it's probably best to leave those languages out for now by simply not plotting their points. My subjective view is that showing updated but narrower benchmarks is (probably) more useful to users than showing out-of-date but broader benchmarks. Thoughts?

We could state: "Languages X, Y, and Z are not included in the latest benchmarks due to licensing issues, but you may view historical benchmarks comparing these languages to an older version of Julia by going to https://web.archive.org/web/20200807141540/https://julialang.org/benchmarks/"

What do you think?

oscardssmith commented 2 years ago

the other approach would be to provide the out of date benchmarks for them. I think either would be acceptable.

acxz commented 2 years ago

My subjective view is that showing updated but narrower benchmarks is (probably) more useful to users than showing out-of-date but broader benchmarks.

I disagree with that. This is because after doing the interpolation the only change (trend wise) is Rust/Julia vs Lua. I don't think showing this is enough to justify dropping many languages. Remember interpolation also includes Fortran not just the closed source ones. As a user I don't want to click another link to find the data I want.

the other approach would be to provide the out of date benchmarks for them. I think either would be acceptable.

Agreed, which is what we are currently doing.

Basically, what I'm trying to get at is, we should not update if we gonna do it partially. If we update, we will do it properly.

acxz commented 2 years ago

Right now effort should be prioritized on #29, #58, & #64.

MilesCranmer commented 2 years ago

Wait, by interpolation, you just mean copying the datapoint from the old graph right? I think if it's just that–keeping performance ratios in the plot–is perfectly fine so long as this is described in the text.

I was more thinking about excluding Mathematica/MATLAB from the new plot, if their entire benchmark is out of date (but even this, I don't think it's a big deal to copy the old benchmarks). But not updating, and instead, interpolating specific benchmarks for languages where there is an issue (like the Fortran compilation issue) sounds pretty reasonable to me.

acxz commented 2 years ago

I was more thinking about excluding Mathematica/MATLAB from the new plot

I would still prefer not to do this, since having comparisons with these languages is rarely seen. It helps new Julia users coming from those closed ecosystems see the light ;) I'd be okay with interpolating this as it seems getting the CI for these is a bit out of our hands.

But not updating, and instead, interpolating specific benchmarks for languages where there is an issue (like the Fortran compilation issue) sounds pretty reasonable to me.

The reason why I don't like this is because we are essentially taking a shortcut in displaying the data. Especially since getting this to work is in our hands (compared to the Mathematica/Matlab issue). The onus is on us to fix our benchmarks, instead of throwing a rug on top of the actual issue and using older results.

In any case for a decision like this I would like for @StefanKarpinski and @ViralBShah to provide the final say.

MilesCranmer commented 2 years ago

SGTM!

I suppose I agree–having the Mathematica/MATLAB results is really useful for users from those domains of scientific computing. I think as long as everything is described in the text about exclusions/interpolations, we are fine.

I guess the question is: what is the purpose of these benchmarks? Is it a quantitative comparison table for attracting new users, or is it a scientific dataset of performance across languages? If it is the former, inclusion of these proprietary languages (even if the numbers are old) is really important to help demonstrate Julia's advantage against all other languages. If it is the latter, then having up-to-date and accurate numbers is most important, even if it means excluding some languages. In reality it's probably a combination, in which case this question is difficult to answer...

MilesCranmer commented 1 year ago

Pinging this as I just noticed the benchmarks page is still showing Julia 1.0.0. Can we put this up soon? I'm linking to the benchmarks in a paper (coming out in two days) as evidence of Julia being as fast as C++/Rust; would be great if the measurements were up-to-date :rocket:

ViralBShah commented 1 year ago

Someone needs to set up all these environments and benchmark. I don't have any of these proprietary software licenses, for example.

MilesCranmer commented 1 year ago

Maybe we could just have a second panel of benchmarks on https://julialang.org/benchmarks/?

The Julia 1.0.0 benchmark with matlab/mathematica could be at the top,
There could be a second headings "Julia 1.8 Micro-Benchmarks" with the updated values (ran from the GH actions), excluding any datapoints which we can't currently collect (due to licenses) or are invalid.

It has been over 3 years since the last full-scale benchmark, so I don't have high hopes anybody will get around to doing it soon. But it would be great to display Julia 1.8 benchmarks for all to see, though, at least somewhere we can link to.

ViralBShah commented 1 year ago

We do have large github actions runners available in this org - which will help whenever we set that sort of thing up.

@MilesCranmer Would it help if you had commit access to the MicroBenchmarks and this repo so that you can directly edit to your liking?

EDIT: Sent you invite.

manor commented 1 year ago

Is there a way to see these results: https://github.com/JuliaLang/Microbenchmarks/actions/runs/5567263800/workflow#L103? Did I understand correctly that there would've been a csv file generated that has since been deleted (because the logs have expired)?

ViralBShah commented 1 year ago

Yes, I believe the logs get deleted, but perhaps we can run it again.

JuliaLang / Microbenchmarks

update the benchmark results with Julia 1.8 #48

51 is merged now 🎉