athas / raytracers

Performance comparison of parallel ray tracing in functional programming languages
303 stars 19 forks source link

F# measurement methodology #16

Closed NinoFloris closed 4 years ago

NinoFloris commented 4 years ago

Could you give some info on how the test is run? F# is the only jitted implementation in the list and the 10ms for bvh really seems too high. As CoreCLR is a JIT, to benchmark it properly it needs multiple runs in the same process to stabilize.

Alternatively I could PR a CoreRT build (native compilation)

athas commented 4 years ago

I just run the commands in the Makefile a few times. I'd be perfectly happy to accept better timing.

Back when I used Mono, I got a profiling report that showed only something like 1% of the time was spent in the JIT, which made me stop worrying.

gsomix commented 4 years ago

@NinoFloris I think we can use awesome BenchmarkDotNet library for better timing. However it may increase benchmarking time and complicate project (we need to move raytracing part to a library).

@athas What do you think?

NinoFloris commented 4 years ago

Testing Philip's version on my laptop gives the following numbers for running the rgbbox scene.

Method Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated
Original 655.5 us 12.48 us 12.82 us 1.00 408.2031 91.7969 - 1333.64 KB
Modified 265.9 us 5.07 us 6.59 us 0.40 189.9414 33.6914 - 558.06 KB

Running it a few times and picking the best result will always be incorrect for a JIT, it needs to be run multiple times in process :)

I also compiled a native executable which gave me similar but even faster times, 0.0023555s.

I can lean either way, BDN (benchmarkdotnet) or native exe

Native looked like this for rgbbox

Using scene 'rgbbox' (-s to switch).
Scene BVH construction in 0.0023555s.
Rendering in 0.0587126s.
gsomix commented 4 years ago

@NinoFloris Hey, do you use netcore3.1 version with Tiered JIT enabled?

NinoFloris commented 4 years ago

Yep, BDN will do the right thing, first results were from BDN. Native results are from the stopwatch harness @athas created initially.

NinoFloris commented 4 years ago

@gsomix see https://github.com/cartermp/trace

NinoFloris commented 4 years ago

@athas was there a reason you downgraded netcoreapp3.1 to netcoreapp2.1? Both are LTS, though tiered jitting is enabled by default in 3.1 so I expect you'd see better timings in 2.1 if you don't let it warm up :)

athas commented 4 years ago

I'm running this on NixOS, and the dotnet-sdk package in Nixpkgs is 2.2.4.

NinoFloris commented 4 years ago

3.1 should be available https://github.com/NixOS/nixpkgs/issues/73193#issuecomment-579825819

athas commented 4 years ago

Oh, I see. It's just under a nondefault name. I'll try running with 3.1.

I'm inclined towards simplicity in general, but if the only way to measure F# properly is by using a big library, then I guess that's how it has to be. I do wish to mention the following sentence in the readme: "The benchmarking technique is mostly crude, so assume only large relative differences are meaningful." That's partially because I didn't want this project to be about fiddling with timing. The runtimes are long enough that we can get good enough results just by running a few times and taking an average (in-process it's fine; some of the implementations do that).

athas commented 4 years ago

Running it with 3.1 is significantly slower (about a factor of two). Should I make it run the functions in a loop instead?

gsomix commented 4 years ago

@athas Yes, because of Tiered JIT.

NinoFloris commented 4 years ago

@athas I'll create a PR with instructions to build jitted and native artifacts on top of your loop changes when you have them

few times and taking an average

Take the last instead ;)

athas commented 4 years ago

I pushed changes that run the functions multiple times. I have not updated the README table, but this is the result:

dotnet run -c release -- -f rgbbox_1000.ppm -s rgbbox -n 1000 -m 1000
Using scene 'rgbbox' (-s to switch).
Timing over average of 10 runs (-r to change).
Scene BVH construction in 0.001361s.
Rendering in 0.879994s.
Writing image to rgbbox_1000.ppm.
dotnet run -c release -- -f irreg_1000.ppm -s irreg -n 1000 -m 1000
Using scene 'irreg' (-s to switch).
Timing over average of 10 runs (-r to change).
Scene BVH construction in 0.007581s.
Rendering in 0.484474s.
Writing image to irreg_1000.ppm.
ForNeVeR commented 4 years ago

FYI, since we have Scala now, too: it will have to use the same complex testing methodology, since the main language implementation is also JITted. One possible explanation of current Scala results is that its JIT has much more tiers than .NET one.

athas commented 4 years ago

The F# timing should be good now and #21 fixed it for Scala.

NinoFloris commented 4 years ago

Yesterday there were some perf issues with the blob feed hosting the native compilers, which is why I didn't create the PR. https://github.com/dotnet/sdk/issues/11283

Still interested?

athas commented 4 years ago

Sure.