Closed NinoFloris closed 4 years ago
I just run the commands in the Makefile a few times. I'd be perfectly happy to accept better timing.
Back when I used Mono, I got a profiling report that showed only something like 1% of the time was spent in the JIT, which made me stop worrying.
@NinoFloris I think we can use awesome BenchmarkDotNet library for better timing. However it may increase benchmarking time and complicate project (we need to move raytracing part to a library).
@athas What do you think?
Testing Philip's version on my laptop gives the following numbers for running the rgbbox scene.
Method | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
Original | 655.5 us | 12.48 us | 12.82 us | 1.00 | 408.2031 | 91.7969 | - | 1333.64 KB |
Modified | 265.9 us | 5.07 us | 6.59 us | 0.40 | 189.9414 | 33.6914 | - | 558.06 KB |
Running it a few times and picking the best result will always be incorrect for a JIT, it needs to be run multiple times in process :)
I also compiled a native executable which gave me similar but even faster times, 0.0023555s.
I can lean either way, BDN (benchmarkdotnet) or native exe
Native looked like this for rgbbox
Using scene 'rgbbox' (-s to switch).
Scene BVH construction in 0.0023555s.
Rendering in 0.0587126s.
@NinoFloris Hey, do you use netcore3.1 version with Tiered JIT enabled?
Yep, BDN will do the right thing, first results were from BDN. Native results are from the stopwatch harness @athas created initially.
@gsomix see https://github.com/cartermp/trace
@athas was there a reason you downgraded netcoreapp3.1 to netcoreapp2.1? Both are LTS, though tiered jitting is enabled by default in 3.1 so I expect you'd see better timings in 2.1 if you don't let it warm up :)
I'm running this on NixOS, and the dotnet-sdk
package in Nixpkgs is 2.2.4.
3.1 should be available https://github.com/NixOS/nixpkgs/issues/73193#issuecomment-579825819
Oh, I see. It's just under a nondefault name. I'll try running with 3.1.
I'm inclined towards simplicity in general, but if the only way to measure F# properly is by using a big library, then I guess that's how it has to be. I do wish to mention the following sentence in the readme: "The benchmarking technique is mostly crude, so assume only large relative differences are meaningful." That's partially because I didn't want this project to be about fiddling with timing. The runtimes are long enough that we can get good enough results just by running a few times and taking an average (in-process it's fine; some of the implementations do that).
Running it with 3.1 is significantly slower (about a factor of two). Should I make it run the functions in a loop instead?
@athas Yes, because of Tiered JIT.
@athas I'll create a PR with instructions to build jitted and native artifacts on top of your loop changes when you have them
few times and taking an average
Take the last instead ;)
I pushed changes that run the functions multiple times. I have not updated the README table, but this is the result:
dotnet run -c release -- -f rgbbox_1000.ppm -s rgbbox -n 1000 -m 1000
Using scene 'rgbbox' (-s to switch).
Timing over average of 10 runs (-r to change).
Scene BVH construction in 0.001361s.
Rendering in 0.879994s.
Writing image to rgbbox_1000.ppm.
dotnet run -c release -- -f irreg_1000.ppm -s irreg -n 1000 -m 1000
Using scene 'irreg' (-s to switch).
Timing over average of 10 runs (-r to change).
Scene BVH construction in 0.007581s.
Rendering in 0.484474s.
Writing image to irreg_1000.ppm.
FYI, since we have Scala now, too: it will have to use the same complex testing methodology, since the main language implementation is also JITted. One possible explanation of current Scala results is that its JIT has much more tiers than .NET one.
The F# timing should be good now and #21 fixed it for Scala.
Yesterday there were some perf issues with the blob feed hosting the native compilers, which is why I didn't create the PR. https://github.com/dotnet/sdk/issues/11283
Still interested?
Sure.
Could you give some info on how the test is run? F# is the only jitted implementation in the list and the 10ms for bvh really seems too high. As CoreCLR is a JIT, to benchmark it properly it needs multiple runs in the same process to stabilize.
Alternatively I could PR a CoreRT build (native compilation)