Open UserAB1236872 opened 10 years ago
for i := 0; i < b.N; i++ {
b.StopTimer()
v1 := Vec4{r.Float32(), r.Float32(), r.Float32(), r.Float32()}
v2 := Vec4{r.Float32(), r.Float32(), r.Float32(), r.Float32()}
b.StartTimer()
v1.Sub(v2)
}
Yeah, I'm not surprised that doesn't work as intended. StopTimer()
and StartTimer()
are likely too high-grained and meant to be used before/after the tight b.N
loop, not inside.
Nice find and good news though!
I took out the random data from one benchmark function and the speed drastically improved (~40 ns/op faster), despite the random data being behind StopTimer calls.
I put in the random data to prevent the compiler from making optimization with constant data and biasing results (since they're really simple functions). However, I made the following simple change to VecSub:
Because the compiler shouldn't be able to do any constant optimization for that (since it's based off the seed at runtime). And got ~12.6 ns/op, which is about as good as a SIMD package I benchmarked (~11.9 ns/op). This is down from previous benchmarks of the form
Which were clocking ~50.9 ns/op. And also took an age to run. Further, introducing random data in the same form as our current benchmarks to the SIMD package's benchmarks catapaulted it up to the ~40-50ns/op range -- implying there's almost certainly no magical Go compiler trickery going on biasing our benchmarks to an unfairly low number.
This is actually very good news. I'm filing this because I don't have time for the next little bit if someone else wants to do this. I think all Matrix, Vector, and Quaternion benchmarks use random data now. It's not a horifically difficult thing to change, just tedious.
On the other hand, I'm not sure why b.StopTimer is being so finnicky, but it may just be meant for benchmarks at ns/op levels that aren't so low (~40ns/op is nothing compared to something like a graph search).