MADEAPPS / newton-dynamics

Newton Dynamics is an integrated solution for real time simulation of physics environments.
http://www.newtondynamics.com
Other
936 stars 182 forks source link

Allow building without AVX. #243

Closed iSLC closed 3 years ago

iSLC commented 3 years ago

At this moment, compiling fails if AVX is not enabled. First, because the code using AVX is not excluded from the list of files to be compiled. And second, because there is no macro guard to not attempt to use that code if AVX was not enabled.

This change makes it so that files containing Avx in their name are excluded from the list of files to be compiled (fixes the compiler failure). And also adds a macro to not attempt to use code from those files since they were excluded (fixes the linker failure that inevitably comes after the first change).

JulioJerez commented 3 years ago

merged: this was a good one thanks.

any reason why you would not wnat avx2 on a GCC system? the AVX2 is hugely faster that all other solvers because is use the 8 way simd lane as a 8 independent cores. AVX2 provide and instruction that call gather and scatter that allows for the use of a register as it is was a GPU multiprocessor. they are allow in AVX2 but they are still faster than the equivalent version using C code.

ther is an overhead for transposing the data each tick, but this overhead is linear and so it is not worse than the number of iterations for calculation teh joint forces. but with the overhead and emulation gather and scatter the SSE soa version of teh solve can only solve 4 joint per call. the make the sse soa solver marginally faster that the scalar solver. so it is ther as a reference to start from when making new solvers.

for the AVX2 solver the number of joints per call is 8, so when solving large number of joints, is much faster that ther other almost twice as fast.

so is the engine has to solve say 2000 joint, the scalar solve will call 2000 number of iterations. the SSE sao will call 2000 / 4 number of iterations = 500 number of iterations (but the transpose over head is significant) the avx2 will call 2000 / 8 number of iterations = 250 * number of iterations (now we see a substantial performance gain)

I can only imagine what a avx512 would do since is has even more powerful swizzle and gather instructions.

anyway thank for this patch

iSLC commented 3 years ago

merged: this was a good one thanks.

any reason why you would not wnat avx2 on a GCC system?

There is absolutely no reason. I just happen to compile using default options which makes AVX2 to be OFF by default and it failed. And though I should address that because for example you wouldn't have AVX on ARM or some other platform and it would make sense to be able to build in generic mode.

JulioJerez commented 3 years ago

Ah, good point.

On Wed, Aug 4, 2021, 1:23 PM Sandu Liviu Catalin @.***> wrote:

merged: this was a good one thanks.

any reason why you would not wnat avx2 on a GCC system?

There is absolutely no reason. I just happen to compile using default options which makes AVX2 to be OFF by default and it failed. And though I should address that because for example you wouldn't have AVX on ARM or some other platform and it would make sense to be able to build in generic mode.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/MADEAPPS/newton-dynamics/pull/243#issuecomment-892949208, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6EPJAOVVXOEMYLGTFDIQLT3GOU5ANCNFSM5BRKOULA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .