MADEAPPS / newton-dynamics

Newton Dynamics is an integrated solution for real time simulation of physics environments.
http://www.newtondynamics.com
Other
938 stars 182 forks source link

Questions about status of dgNewtonSse #215

Open JayFoxRox opened 4 years ago

JayFoxRox commented 4 years ago

For running Newton on the original Xbox (Pentium 3, 733MHz, MMX and SSE; specifically no SSE2 or higher) I wondered about the status of dgNewtonSse. That is still mentioned here:

https://github.com/MADEAPPS/newton-dynamics/blob/fd2c31db491cda38612649809c5f4341f7f7393a/CMakeLists.txt#L18

and here:

https://github.com/MADEAPPS/newton-dynamics/blob/fd2c31db491cda38612649809c5f4341f7f7393a/sdk/CMakeLists.txt#L41-L43

(and potentially elsewhere)

However, the actual folder newton-dynamics/sdk/dgNewtonSse is nowhere to be found. To find out when / why it was deleted, I checked the git history.

The first hint I found was in https://github.com/MADEAPPS/newton-dynamics/commit/4290f251adb6d03ed2d3d61746075bd1d647a4e6 which mentions them as "unfinished" which implies they are still planned / wanted.

I then kept looking if they were finished in the past (before being unfinished by bitrot), and found the last revision with dgNewtonSse: https://github.com/MADEAPPS/newton-dynamics/tree/ce423a44e3d0e6b075d84195aec29081ce9acb66/sdk/dgNewtonSse After that, it was renamed / moved to dgNewtonGL: https://github.com/MADEAPPS/newton-dynamics/tree/159b8b469b3f8cea55c69a8dfd93ceb1e76a68af/sdk/dgNewtonGL

Tracking the changes became tedious, so I started looking at the state of the current implementations for AVX and SSE4.2.

So my questions are:

JulioJerez commented 4 years ago

there is not dgNewtonSse plugin. the parallel solver is the default and use the most basic version of SSE and SSE2. this solver is part of the engine, so it work static or as dll. all other plugins use more advance version or SSE. instructions like gathering/ scatherisn, muladd, and some others but tha can only be load as DLLs.

the code template in teh dll solvers may look identical but the driver function are very different and that what make then incompatible. for example a cpu that does not support avx2 will not load but a that cpu may load the AVX plugin. teh avx2 plugin, in theory execute twice as many flops because the muladd instruction, it also support gathering whi simply some operation. In practice avx2 is only about 5 to 10% faster.

in general the plugins are fasters because the use the simd vector are if they were GPU compute units. for example a avx2 solve solver 16 joints per iterations where the default solver solve one per iteration. This requires some overhead to transpose the data for array or structure to structures of array, therefore the true benefic is when thousand of joints are resolve so the cost of transposing is amortized by the gain of the solver.

Sutor vectorization doe not translate to big gain because the engine predate the date of these compiler optimization, but since the engine support scalar math, you can just define the preprocessor USE SCALAR operation and let the compiler do all the optimization. the time I try I have never seen it doing a better job than the hand made optimizations.

JulioJerez commented 4 years ago

if you are using an older intel Pentium 3, 733MHz, MMX and SSE; the SSE mode is not really very good because internally it is still a 64 bit bust. so SSE is kind of a place holder for that CPU, it was not until the intel core duo that SSE became a real factor in floats throughput. In your case, the default solver is your best option, because you avoid the extra work of the plugin, remember teh plugin are faster because the capitalize in one or more feature of the instruction set. be that 8 way float vector avx, muldd, 256 bit wide internal bust, and so on. you you cpu does nor supports any of then so teh overhead of the plugin will be a wast that cann't be recovered by special instructions.