Closed Enet4 closed 6 years ago
A small but important update: the mandel
function was modified to compare the complex value's squared norm with 4, rather than the value's norm with 2. This seems to be a fair optimization, since the other implementations are doing the same thing. The performance of this particular task is now on par with the Julia implementation.
The necessary preparation steps for the benchmark machine:
- Install Rustup
- Add the toolchain mentioned in the rust-toolchain file: rustup toolchain add nightly-2018-04-16
How about rustup toolchain add stable
instead? That would be more consistent with the Julia versioning. I've been running the benchmarks and posting the data only when Julia updates its stable version: 0.6.0, 0.6.1, 0.6.2, etc.
How about
rustup toolchain add stable
instead?
As previously mentioned here, the project relies on the test
crate for the black_box
function, which is currently only available in a nightly Rust compiler. Stabilization of this function is unlikely to happen soon, it is an ongoing concern without a trivial solution.
The only way to make this work with a stable toolchain is to replace black_box
with something that also prevents undesirable optimizations (namely the invariant code motion seen in fib
and a few others). From the experiments I made some months ago, this seemed to be possible with volatile reads and writes. I can look into this again soon.
I've been running the benchmarks and posting the data only when Julia updates its stable version: 0.6.0, 0.6.1, 0.6.2, etc.
Would the other microbenchmarks also be updated in the process? Rust has a regular 6-week release cycle, which means that pointing this project to the stable
toolchain will also lead to iteratively updated versions of the compiler.
I think we should go ahead and merge this soon without needing to debate all the fine details of how we get and update the rust compiler, otherwise I fear this may languish and never get merged. Of course, we can open a separate issue about how the rust compiler should be acquired and run.
Sounds good to me. This and #2 look good for merging. I'll use the Rust nightly for now and aim to include Rust data in the julialang.org table and plot when julia-0.7.0 drops.
@johnfgibson, I've given you write access to this repo, so feel free to merge PRs as they seem ready. Let me know if there are any permission issues and we'll get them sorted out.
The Rust code runs fine and produces sensible results. I don't understand Rust well enough to vouch for the code's correctness & consistency with other languages. I see @stevengj reviewed to some extent above. I'll plan to merge in a day or two if I don't hear any objections.
Rust results, with julia-0.7.0-DEV.5096 (7 days old master).
cputime lang
1.000 C
1.080 LuaJIT
1.089 Rust
1.186 Julia
1.450 Go
1.592 Fortran
2.929 Java
4.133 JavaScript
9.964 Matlab
14.072 Mathematica
16.794 Python
69.365 R
515.363 Octave
I haven't run the benchmarks with every language available, but they seem to be roughly in line with my local run:
perf | lang |
---|---|
1.000 | C |
1.060 | Rust |
1.180 | Julia |
Feel free to ask about any other details of the code. I noticed that the readme also contains a list of languages, so maybe I could update that as well? The PR might also be updated in time for the decisions made in #10.
@johnfgibson, just to be sure: will the issue with the Mathematica label, seen in your comment above, be present in the final rendered image?
@waldyrious : No, that Mathematica label problem arose in an inkscape SVG -> PNG conversion needed to post the image to this thread. The SVG that'll appear on julialang.org is just fine.
Is there a reason that you use vectors instead of arrays on the stack? All the Vec
s in directblas::randmatstat
could be turned into fixed-size arrays.
@Vurich, if I'm understanding (it's quite possible that I'm not since I'm not a Rust programmer), that would be comparable to using StaticArrays in Julia. That's a legitimate optimization but in Julia it goes beyond the basics. In Rust, how advanced of a change would that be?
@StefanKarpinski I would actually consider using Vec
to be an antipattern when you know the length of an array is constant. Using fixed-size arrays expresses intent much better. I think it would just be a case of turning let n = 5
into const N: usize = 5
and let foo = vec![val; something]
into let foo = [val; something]
I went for dynamic allocations because the C version is doing the same thing. At first glance, the Julia benchmark isn't using StaticArrays either, right? I agree that the avoidance of heap allocations is a plus, but I would have to ensure that this is a fair optimization.
In practice, changing this should be trivial: fixed size arrays deref to a slice just the same, and ndarray views can be created from arbitrary slices. However, since Rust doesn't have const generics yet, I would have to either hardcode the outputs of nrand
to the intended dimensions or just construct the arrays inline.
Hmm, I'd assumed the main point of randmatstat
was to test the speed of tr((P'*P)^4)
on large matrices. But it's n x n for n=5. Fixed-size arrays seem natural for that. Even so, the microbenchmarks aim to test identical algorithms in many languages. I'm inclined to think doing randmatstat
with static arrays where possible (C, Fortran, Julia, Rust...) and heap arrays where not (Python, Matlab, Java, ...?) would not be in the spirit of identical algorithms.
Unless the benchmark code is the algorithm and the array implementation is just infrastructure...
The point of the way these were written is that in library code you don't generally know that you're allocating a specific array size: you have to make your library generic and code across different sizes and do dynamic allocations. Sure, in this specific case you can do static allocations, but you can also just replace the entire benchmark computation with a constant, so drawing some lines about what is or is not "allowed" is in the nature of benchmarking. The line is currently drawn at "everyone does dynamic memory allocation of non-fixed-sized arrays". To a significant extent, this is to give the Pythons and Rs in the comparison an easier time since that's all they can do and they're already getting clobbered—it seems unsporting not to at least give them a fighting chance.
:tada:
Just to let you know that I experimented with turning heap allocations into static arrays in randmatstat (branch). This resulted in a minor speed-up of around 1.02 in my desktop (in raw values, from 6.10 to 5.97). Still, I agree that keeping the heap allocations is a fair decision to make.
Good to hear—I kind of figured that would be the case. Dynamic allocation with a decently implemented allocator shouldn't be a bottleneck here. If we switched Julia to use StaticArrays, on the other hand, we might see a significant speedup since the package provides specialized fully unrolled matmul implementations for fixed matrix sizes. The speedup would be from code specialization, however, rather than avoiding dynamic memory allocation.
If we switched Julia to use StaticArrays, on the other hand, we might see a significant speedup since the package provides specialized fully unrolled matmul implementations for fixed matrix sizes. The speedup would be from code specialization, however, rather than avoiding dynamic memory allocation.
In a way, that is what the Rust community intends to achieve with the const generics proposal. :) The folks working on embedded systems are particularly concerned about this, since they usually cannot rely on heap allocations.
This PR adds a Rust implementation of the benchmark. This contribution was originally in JuliaLang/julia#22433, but the merge was postponed until the microbenchmarks were migrated here. It should be ready to go now. :+1:
The necessary preparation steps for the benchmark machine:
rustup toolchain add nightly-2018-04-16
Then:
make benchmark/rust.csv