Add Rust microbenchmark

Enet4 commented 6 years ago

This PR adds a Rust implementation of the benchmark. This contribution was originally in JuliaLang/julia#22433, but the merge was postponed until the microbenchmarks were migrated here. It should be ready to go now. :+1:

The necessary preparation steps for the benchmark machine:

Install Rustup
Add the toolchain mentioned in the rust-toolchain file: rustup toolchain add nightly-2018-04-16

Then: make benchmark/rust.csv

Enet4 commented 6 years ago

A small but important update: the mandel function was modified to compare the complex value's squared norm with 4, rather than the value's norm with 2. This seems to be a fair optimization, since the other implementations are doing the same thing. The performance of this particular task is now on par with the Julia implementation.

johnfgibson commented 6 years ago

The necessary preparation steps for the benchmark machine:

Install Rustup

Add the toolchain mentioned in the rust-toolchain file: rustup toolchain add nightly-2018-04-16

How about rustup toolchain add stable instead? That would be more consistent with the Julia versioning. I've been running the benchmarks and posting the data only when Julia updates its stable version: 0.6.0, 0.6.1, 0.6.2, etc.

Enet4 commented 6 years ago

How about rustup toolchain add stable instead?

As previously mentioned here, the project relies on the test crate for the black_box function, which is currently only available in a nightly Rust compiler. Stabilization of this function is unlikely to happen soon, it is an ongoing concern without a trivial solution.

The only way to make this work with a stable toolchain is to replace black_box with something that also prevents undesirable optimizations (namely the invariant code motion seen in fib and a few others). From the experiments I made some months ago, this seemed to be possible with volatile reads and writes. I can look into this again soon.

I've been running the benchmarks and posting the data only when Julia updates its stable version: 0.6.0, 0.6.1, 0.6.2, etc.

Would the other microbenchmarks also be updated in the process? Rust has a regular 6-week release cycle, which means that pointing this project to the stable toolchain will also lead to iteratively updated versions of the compiler.

StefanKarpinski commented 6 years ago

I think we should go ahead and merge this soon without needing to debate all the fine details of how we get and update the rust compiler, otherwise I fear this may languish and never get merged. Of course, we can open a separate issue about how the rust compiler should be acquired and run.

johnfgibson commented 6 years ago

Sounds good to me. This and #2 look good for merging. I'll use the Rust nightly for now and aim to include Rust data in the julialang.org table and plot when julia-0.7.0 drops.

StefanKarpinski commented 6 years ago

@johnfgibson, I've given you write access to this repo, so feel free to merge PRs as they seem ready. Let me know if there are any permission issues and we'll get them sorted out.

johnfgibson commented 6 years ago

The Rust code runs fine and produces sensible results. I don't understand Rust well enough to vouch for the code's correctness & consistency with other languages. I see @stevengj reviewed to some extent above. I'll plan to merge in a day or two if I don't hear any objections.

Rust results, with julia-0.7.0-DEV.5096 (7 days old master). benchmarks

cputime lang
1.000   C
1.080   LuaJIT
1.089   Rust
1.186   Julia
1.450   Go
1.592   Fortran
2.929   Java
4.133   JavaScript
9.964   Matlab
14.072  Mathematica
16.794  Python
69.365  R
515.363 Octave

Enet4 commented 6 years ago

I haven't run the benchmarks with every language available, but they seem to be roughly in line with my local run:

perf	lang
1.000	C
1.060	Rust
1.180	Julia

Feel free to ask about any other details of the code. I noticed that the readme also contains a list of languages, so maybe I could update that as well? The PR might also be updated in time for the decisions made in #10.

waldyrious commented 6 years ago

@johnfgibson, just to be sure: will the issue with the Mathematica label, seen in your comment above, be present in the final rendered image?

johnfgibson commented 6 years ago

@waldyrious : No, that Mathematica label problem arose in an inkscape SVG -> PNG conversion needed to post the image to this thread. The SVG that'll appear on julialang.org is just fine.

eira-fransham commented 6 years ago

Is there a reason that you use vectors instead of arrays on the stack? All the Vecs in directblas::randmatstat could be turned into fixed-size arrays.

StefanKarpinski commented 6 years ago

@Vurich, if I'm understanding (it's quite possible that I'm not since I'm not a Rust programmer), that would be comparable to using StaticArrays in Julia. That's a legitimate optimization but in Julia it goes beyond the basics. In Rust, how advanced of a change would that be?

eira-fransham commented 6 years ago

@StefanKarpinski I would actually consider using Vec to be an antipattern when you know the length of an array is constant. Using fixed-size arrays expresses intent much better. I think it would just be a case of turning let n = 5 into const N: usize = 5 and let foo = vec![val; something] into let foo = [val; something]

Enet4 commented 6 years ago

I went for dynamic allocations because the C version is doing the same thing. At first glance, the Julia benchmark isn't using StaticArrays either, right? I agree that the avoidance of heap allocations is a plus, but I would have to ensure that this is a fair optimization.

In practice, changing this should be trivial: fixed size arrays deref to a slice just the same, and ndarray views can be created from arbitrary slices. However, since Rust doesn't have const generics yet, I would have to either hardcode the outputs of nrand to the intended dimensions or just construct the arrays inline.

johnfgibson commented 6 years ago

Hmm, I'd assumed the main point of randmatstat was to test the speed of tr((P'*P)^4) on large matrices. But it's n x n for n=5. Fixed-size arrays seem natural for that. Even so, the microbenchmarks aim to test identical algorithms in many languages. I'm inclined to think doing randmatstat with static arrays where possible (C, Fortran, Julia, Rust...) and heap arrays where not (Python, Matlab, Java, ...?) would not be in the spirit of identical algorithms.

Unless the benchmark code is the algorithm and the array implementation is just infrastructure...

StefanKarpinski commented 6 years ago

The point of the way these were written is that in library code you don't generally know that you're allocating a specific array size: you have to make your library generic and code across different sizes and do dynamic allocations. Sure, in this specific case you can do static allocations, but you can also just replace the entire benchmark computation with a constant, so drawing some lines about what is or is not "allowed" is in the nature of benchmarking. The line is currently drawn at "everyone does dynamic memory allocation of non-fixed-sized arrays". To a significant extent, this is to give the Pythons and Rs in the comparison an easier time since that's all they can do and they're already getting clobbered—it seems unsporting not to at least give them a fighting chance.

Enet4 commented 6 years ago

:tada:

Just to let you know that I experimented with turning heap allocations into static arrays in randmatstat (branch). This resulted in a minor speed-up of around 1.02 in my desktop (in raw values, from 6.10 to 5.97). Still, I agree that keeping the heap allocations is a fair decision to make.

StefanKarpinski commented 6 years ago

Good to hear—I kind of figured that would be the case. Dynamic allocation with a decently implemented allocator shouldn't be a bottleneck here. If we switched Julia to use StaticArrays, on the other hand, we might see a significant speedup since the package provides specialized fully unrolled matmul implementations for fixed matrix sizes. The speedup would be from code specialization, however, rather than avoiding dynamic memory allocation.

Enet4 commented 6 years ago

If we switched Julia to use StaticArrays, on the other hand, we might see a significant speedup since the package provides specialized fully unrolled matmul implementations for fixed matrix sizes. The speedup would be from code specialization, however, rather than avoiding dynamic memory allocation.

In a way, that is what the Rust community intends to achieve with the const generics proposal. :) The folks working on embedded systems are particularly concerned about this, since they usually cannot rely on heap allocations.

JuliaLang / Microbenchmarks

Add Rust microbenchmark #1