favreau / bullet

Automatically exported from code.google.com/p/bullet
0 stars 0 forks source link

If that's worth to update the SSE code to AVX #474

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Sandy Bridge is released.

If that make sense to update the SSE Code to AVX.
Mainly in GJK and Constraint Slover.

Original issue reported on code.google.com by liangma...@gmail.com on 13 Jan 2011 at 3:52

GoogleCodeExporter commented 9 years ago
Yes, it would be great to have some AVX optimizations.

Stan Melax did some nice work on AVX optimized cloth:

See http://software.intel.com/en-us/articles/avx-cloth and
http://software.intel.com/en-us/articles/intel-graphics-developers-guides/

Perhaps it inspires someone to contribute?

Original comment by erwin.coumans on 13 Jan 2011 at 9:56

GoogleCodeExporter commented 9 years ago
Issue 404 has been merged into this issue.

Original comment by erwin.coumans on 13 Jan 2011 at 9:56

GoogleCodeExporter commented 9 years ago
btSequentialImpulseConstraintSolver::resolveSingleConstraintRowGenericSIMD
SSE version could improve following issue by avx,
1. replace set1 by broadcast(avx only)
2. replace the dot product by vdotps(avx only) instead of 4 shuffle.
3. merge deltaVel1Dotn,deltaVel2Dotn calculation to 256
4. merge linearComponentA|B calculation to 256

Original comment by liangma...@gmail.com on 13 Jan 2011 at 11:23

GoogleCodeExporter commented 9 years ago
Attached is the Intel AVX cloth demo recompiled using emulated AVX, so it 
should run on systems that don't have AVX. The 4-wide (128 bit) SSE version 
runs pretty decent.

Original comment by erwin.coumans on 15 Jan 2011 at 9:19

Attachments:

GoogleCodeExporter commented 9 years ago
Running your non avx version I get 60fps constant on a phenom x6 @ 3.64ghz.

Seems a nice speed to me! : ]

Original comment by tpant...@gmail.com on 16 Jan 2011 at 12:00

GoogleCodeExporter commented 9 years ago
Attached is Stan Melax' avx cloth source code with my modified avx emulation 
header file (avxintrin_mini.h), derived from Intel's original header file:
http://software.intel.com/en-us/articles/avx-emulation-header-file/

Note that the __emu_mm256_unpackhi_ps/__emu_mm256_unpacklo_ps is brute force, 
it could be optimized using SSE.

You can use cmake to create project files for most visual studio versions, it 
will statically link all (instead of using DLLs).

Original comment by erwin.coumans on 17 Jan 2011 at 8:25

Attachments:

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Attached a new compiled version for the Intel non-avx cloth demo windows, to 
avoid the MSVC100.dll missing error.

Original comment by erwin.coumans on 4 Feb 2011 at 12:03

Attachments:

GoogleCodeExporter commented 9 years ago
I setup a platform i72600k, win7 sp1 32bit, visual c++ 2010 express. 
this platform support avx develop and run time env.  Could any give some 
suggestion how to use the profile infrastructure to estimate the performance?

Original comment by liangma...@gmail.com on 30 Sep 2011 at 9:35

GoogleCodeExporter commented 9 years ago
See https://github.com/bulletphysics/bullet3/issues/135

Original comment by erwin.coumans on 30 Mar 2014 at 7:40