Open theia-ajax opened 9 years ago
See https://github.com/PistonDevelopers/graphics/issues/5
The extra precision does matter for a larger invariant space of I = M^-1 M
.
Modern CPUs uses the same hardware register for computing f64
and f32
, which leads to f32
sometimes being slower.
It would be interesting if you have a use case where f32
is faster, and also interesting if there are use cases where f64
is faster. To test this you need to run the game loop in benchmark mode, see http://blog.piston.rs/2015/05/09/benchmark-mode/
All scalars is tied up to the Scalar
type alias, so you can recompile the library for f32
if you need it.
The extra precision is only necessary when game worlds are quite large which I imagine would not be the general case but would be important for some games. Perhaps the math lib could be broken out into a single precision and a double precision version much like the graphics backends are broken out into their own modules?
While it's true that the cycle counts for single and double precision operations are the same the fact is that the biggest bottleneck is always going to be cache and as such I'd be willing to bet money that in any benchmark of actual game code f32 will be faster than f64 if only because you'll use less cache space.
I tried setting up my own fork that used f32 but I'm a cargo noob and had trouble actually getting the cargo libs to use it over their own dependencies.
64-bit floating point values are overkill for the average game. Might it be an option to add a feature "double_precision" that can be set in cargo and defines Scalar as 64-bit, while otherwise it's 32-bit?
There is a word "shut up and calculate".
If you are above 50% sure f32
is faster then you should expect more than 50% of the benchmarks showing that f32
is faster. People have done benchmarks with other applications that shows that f64
is faster in some cases, so I would guess 70% without no prior probability starting with the assumption that f32
is always faster.
If the difference is in nanoseconds and the driver overhead is in 1/10 of a microsecond, then the performance gain of using f32
is 1%.
This leaves us with 0.7% expected performance improvement.
Assuming no prior knowledge of benchmarks I am preferring f64
because it has better numeric stability.
In any reasonable benchmark I'm 99% certain f32 will be faster simply due to not having to throttle the cache as much. Huge speed gains are possible if the vec_math lib uses SIMD and you can effectively process twice as many f32s as f64s.
I can think of only 4 games off the top of my head which are limited by the precision of floats (Minecraft, StarCitizen, KSP, and Space Engineers) and in the case of Space Engineers all the physics calculations are broken into discrete chunks such that they can still use single precision math within those spaces and only the rendering is done with 64 bit.
It's also worth noting that while there are limitations imposed by using float both Minecraft and KSP still use float instead of double and just deal with the consequences of that creatively.
Regardless it would be cool if Piston allowed you to grab single or double precision versions of the math lib and every lib used Scalar so that it would be easy to switch between the two.
@tedajax The vecmath lib is generic over f32
and f64
, and every lib should use Scalar
, so you should be able to switch between them, if not please open up an issue. LLVM does autovectorization so there might not be that much speed gain.
Hmm... I am wondering if we can use a default generic. Opening https://github.com/PistonDevelopers/graphics/issues/968
I had tried simply forking graphics and setting Scalar to f32 and then setting up my project to reference my fork instead and there were issues. In several other piston libs it seems people are not using Scalar and are instead using f64 directly so that's probably an issue to be addressed elsewhere.
I'm just wondering what the reasoning behind using f64 instead of f32 for all the math stuff is. In most cases the extra precision is not necessary and this really hurts cache friendliness.