favreau / bullet

Automatically exported from code.google.com/p/bullet
0 stars 0 forks source link

Stack-allocated structure for traversing collision algorithms #453

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hey!

I think I talked about this like a year ago, but we've finally gotten around to 
create a good diff.

Included in the diff is both the traversal for narrowphase (does not include 
anything except core bullet, such as softbodies) and an implementation of the 
bvh-tree in the btcompoundcollisionalgorithm.

Cheers,
Simon

Original issue reported on code.google.com by mr.lundm...@gmail.com on 9 Nov 2010 at 7:36

Attachments:

GoogleCodeExporter commented 9 years ago
Sorry, this should be an enhancement and not a defect.

Original comment by mr.lundm...@gmail.com on 9 Nov 2010 at 7:36

GoogleCodeExporter commented 9 years ago

Great, thanks for wrapping up this patch. 

I'll look into it soon.

Original comment by erwin.coumans on 10 Nov 2010 at 6:14

GoogleCodeExporter commented 9 years ago
I looked at the file, but couldn't use svn patch.

Could you try to use svn to create the patch?
Thanks!
Erwin

Original comment by erwin.coumans on 11 Nov 2010 at 9:16

GoogleCodeExporter commented 9 years ago
attempt to turn this patch from GNU diff into svn diff format.

Still needs review. We need to be careful with API changes and sync them with 
the SPU version.

Original comment by erwin.coumans on 7 Dec 2010 at 7:19

Attachments:

GoogleCodeExporter commented 9 years ago
We are investigating in the changes necessary in this branch:
http://code.google.com/p/bullet/source/browse/#svn/branches/StackAllocation

Original comment by erwin.coumans on 14 Dec 2010 at 5:19

GoogleCodeExporter commented 9 years ago
Here's the latest patch.

Sorry for the delays Erwin.

Cheers,
Simon

Original comment by mr.lundm...@gmail.com on 12 Jan 2011 at 1:27

Attachments:

GoogleCodeExporter commented 9 years ago
Great, thanks for the patch, I applied the patch in the branches/StackAlloc. It 
will be useful in the research for Bullet 3.x

As you know, we are also looking into integrating the new Sony PhysicsEffects2 
(in branches/PhysicsEffects), which has been optimized for PS3, PC/SSE and also 
PSP2.

Have you considered using vectormath instead of btSimdVector3?

Original comment by erwin.coumans on 27 Feb 2011 at 7:42

GoogleCodeExporter commented 9 years ago
Hey Erwin, 

I think that in one of the patches that I mailed you, there was our 
implementation of btSimdVector3, which uses simd/sse with an implementation for 
ps3/pc. We had implemented it in some parts of bullet in order to get decent 
performance.

Unfortunately, I don't work there anymore (started my own company instead), so 
I no longer have access to that code.

Original comment by mr.lundm...@gmail.com on 27 Feb 2011 at 3:19

GoogleCodeExporter commented 9 years ago
Congratulations and good luck with starting your own company!

I just wondered if you ever considered using vectormath (which also is 
optimized simd for sse and ps3/spu/ppu etc), but it seems you already have 
btSimdVector3.

I'll surely consider to integrate your work into Bullet, it won't be all 
written from scratch. As a side note, here is an interesting read about writing 
code from scratch:

http://www.joelonsoftware.com/articles/fog0000000069.html

Original comment by erwin.coumans on 28 Feb 2011 at 5:24

GoogleCodeExporter commented 9 years ago

It got delayed a lot, but I applied the patch (with some changes) to trunk:
https://code.google.com/p/bullet/source/detail?r=2539

Thanks for your contribution!

Original comment by erwin.coumans on 10 Jun 2012 at 4:47

GoogleCodeExporter commented 9 years ago
Just wanted to say I've integrated the new version of the trunk into our local 
copy of Bullet, and we're noticing significant performance degradation, on the 
order of a 60% framerate hit. A profile of the same scene without and with this 
patch applied:

Old version:
Profiling: Root (total running time: 1.327 ms) ---
0 -- stepSimulation (100.00 %) :: 1.327 ms / frame (1 calls)
1 -- updateActions (0.00 %) :: 0.000 ms / frame (0 calls)
2 -- performDiscreteCollisionDetection (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (0.000 %) :: 0.000 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 1.327 ms) ---
...0 -- internalSingleStepSimulation (96.99 %) :: 1.287 ms / frame (1 calls)
...1 -- synchronizeMotionStates (0.90 %) :: 0.012 ms / frame (1 calls)
...Unaccounted: (2.110 %) :: 0.028 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 1.287 ms) ---
......0 -- updateActivationState (0.08 %) :: 0.001 ms / frame (1 calls)
......1 -- updateActions (8.62 %) :: 0.111 ms / frame (1 calls)
......2 -- integrateTransforms (0.93 %) :: 0.012 ms / frame (1 calls)
......3 -- solveConstraints (36.67 %) :: 0.472 ms / frame (1 calls)
......4 -- calculateSimulationIslands (0.93 %) :: 0.012 ms / frame (1 calls)
......5 -- performDiscreteCollisionDetection (44.60 %) :: 0.574 ms / frame (1 
calls)
......6 -- predictUnconstraintMotion (1.48 %) :: 0.019 ms / frame (1 calls)
......Unaccounted: (6.682 %) :: 0.086 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 0.472 ms) ---
.........0 -- solveGroup (93.86 %) :: 0.443 ms / frame (1 calls)
.........1 -- processIslands (1.48 %) :: 0.007 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (3.81 %) :: 0.018 ms / frame (1 calls)
.........Unaccounted: (0.847 %) :: 0.004 ms
............----------------------------------
............Profiling: solveGroup (total running time: 0.443 ms) ---
............0 -- solveGroupCacheFriendlyIterations (79.23 %) :: 0.351 ms / 
frame (1 calls)
............1 -- solveGroupCacheFriendlySetup (19.64 %) :: 0.087 ms / frame (1 
calls)
............Unaccounted: (1.129 %) :: 0.005 ms
............----------------------------------
............Profiling: processIslands (total running time: 0.007 ms) ---
............0 -- solveGroup (0.00 %) :: 0.000 ms / frame (0 calls)
............Unaccounted: (100.000 %) :: 0.007 ms
...............----------------------------------
...............Profiling: solveGroup (total running time: 0.000 ms) ---
...............0 -- solveGroupCacheFriendlyIterations (0.00 %) :: 0.000 ms / 
frame (0 calls)
...............1 -- solveGroupCacheFriendlySetup (0.00 %) :: 0.000 ms / frame 
(0 calls)
...............Unaccounted: (0.000 %) :: 0.000 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time: 
0.574 ms) ---
.........0 -- dispatchAllCollisionPairs (76.31 %) :: 0.438 ms / frame (1 calls)
.........1 -- calculateOverlappingPairs (0.35 %) :: 0.002 ms / frame (1 calls)
.........2 -- updateAabbs (23.00 %) :: 0.132 ms / frame (1 calls)
.........Unaccounted: (0.348 %) :: 0.002 ms
...----------------------------------
...Profiling: performDiscreteCollisionDetection (total running time: 0.000 ms) 
---
...0 -- dispatchAllCollisionPairs (0.00 %) :: 0.000 ms / frame (0 calls)
...1 -- calculateOverlappingPairs (0.00 %) :: 0.000 ms / frame (0 calls)
...2 -- updateAabbs (0.00 %) :: 0.000 ms / frame (0 calls)
...Unaccounted: (0.000 %) :: 0.000 ms

New version:
Profiling: Root (total running time: 2.723 ms) ---
0 -- stepSimulation (99.96 %) :: 2.722 ms / frame (1 calls)
1 -- updateActions (0.00 %) :: 0.000 ms / frame (0 calls)
2 -- performDiscreteCollisionDetection (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (0.037 %) :: 0.001 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 2.722 ms) ---
...0 -- internalSingleStepSimulation (99.16 %) :: 2.699 ms / frame (1 calls)
...1 -- synchronizeMotionStates (0.40 %) :: 0.011 ms / frame (1 calls)
...Unaccounted: (0.441 %) :: 0.012 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 2.699 ms) ---
......0 -- updateActivationState (0.04 %) :: 0.001 ms / frame (1 calls)
......1 -- updateActions (8.04 %) :: 0.217 ms / frame (1 calls)
......2 -- integrateTransforms (0.37 %) :: 0.010 ms / frame (1 calls)
......3 -- solveConstraints (15.86 %) :: 0.428 ms / frame (1 calls)
......4 -- calculateSimulationIslands (0.37 %) :: 0.010 ms / frame (1 calls)
......5 -- performDiscreteCollisionDetection (73.03 %) :: 1.971 ms / frame (1 
calls)
......6 -- predictUnconstraintMotion (0.59 %) :: 0.016 ms / frame (1 calls)
......Unaccounted: (1.704 %) :: 0.046 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 0.428 ms) ---
.........0 -- solveGroup (94.63 %) :: 0.405 ms / frame (1 calls)
.........1 -- processIslands (0.70 %) :: 0.003 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (3.97 %) :: 0.017 ms / frame (1 calls)
.........Unaccounted: (0.701 %) :: 0.003 ms
............----------------------------------
............Profiling: solveGroup (total running time: 0.405 ms) ---
............0 -- solveGroupCacheFriendlyIterations (85.68 %) :: 0.347 ms / 
frame (1 calls)
............1 -- solveGroupCacheFriendlySetup (13.58 %) :: 0.055 ms / frame (1 
calls)
............Unaccounted: (0.741 %) :: 0.003 ms
............----------------------------------
............Profiling: processIslands (total running time: 0.003 ms) ---
............0 -- solveGroup (0.00 %) :: 0.000 ms / frame (0 calls)
............Unaccounted: (100.000 %) :: 0.003 ms
...............----------------------------------
...............Profiling: solveGroup (total running time: 0.000 ms) ---
...............0 -- solveGroupCacheFriendlyIterations (0.00 %) :: 0.000 ms / 
frame (0 calls)
...............1 -- solveGroupCacheFriendlySetup (0.00 %) :: 0.000 ms / frame 
(0 calls)
...............Unaccounted: (0.000 %) :: 0.000 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time: 
1.971 ms) ---
.........0 -- dispatchAllCollisionPairs (95.79 %) :: 1.888 ms / frame (1 calls)
.........1 -- calculateOverlappingPairs (0.10 %) :: 0.002 ms / frame (1 calls)
.........2 -- updateAabbs (4.06 %) :: 0.080 ms / frame (1 calls)
.........Unaccounted: (0.051 %) :: 0.001 ms
...----------------------------------
...Profiling: performDiscreteCollisionDetection (total running time: 0.000 ms) 
---
...0 -- dispatchAllCollisionPairs (0.00 %) :: 0.000 ms / frame (0 calls)
...1 -- calculateOverlappingPairs (0.00 %) :: 0.000 ms / frame (0 calls)
...2 -- updateAabbs (0.00 %) :: 0.000 ms / frame (0 calls)
...Unaccounted: (0.000 %) :: 0.000 ms
----------------------------------

As you can see dispatchAllCollisionPairs now takes nearly triple the time to 
complete, while all other functions have nearly identical execution time. Any 
insight as to the cause of this before I try and delve further in, or worse 
roll back the update?

Original comment by tamaynar...@gmail.com on 28 Jun 2012 at 5:36

GoogleCodeExporter commented 9 years ago
That is not good.

Is there any way to reproduce this performance degradation in any of the Bullet 
demos?

What kind of collision shape types are you using?
Any other info about the test case that you used to show the difference? Are 
both compiled in release mode using the same compiler etc?

Original comment by erwin.coumans on 28 Jun 2012 at 6:25

GoogleCodeExporter commented 9 years ago
We're using btCompoundShape almost exclusively, with the component shapes 
consisting mainly of boxes and cylinders with a few spheres. The fact that we 
have so many compound shapes is one of the reasons I was excited for this 
update :( I'll attempt to reproduce it in a demo and upload the modified demo 
and profiling results here, but for now the only other info I have is that I'm 
certain the compiler and settings are the same (VS2010 with LTCG enabled), 
release mode is certainly targeted.

Original comment by tamaynar...@gmail.com on 28 Jun 2012 at 6:51

GoogleCodeExporter commented 9 years ago
Oh, and several btHeightfieldTerrainShapes for our terrain, in case that is 
applicable

Original comment by tamaynar...@gmail.com on 28 Jun 2012 at 6:52

GoogleCodeExporter commented 9 years ago
It would be great if you can provide a reproduction case that shows the 
performance degradation. Given such repro case I'll make sure it gets fixed.

Original comment by erwin.coumans on 28 Jun 2012 at 6:57

GoogleCodeExporter commented 9 years ago
It appears that the majority of the performance loss was due to some changes to 
the API to make Bullet more const-correct. We're using some elaborate collision 
filtering by overriding btCollisionObject::checkCollideWithOverride, and as the 
argument type for that method was changed to a const pointer, the overridden 
implementation was never called. There is still some slight performance loss, 
but less than a third of what I was experiencing before. If the remaining 
problem is related to the core Bullet lib I'll post details.

P.S. In the future, it may be helpful if any commits which introduce API 
changes are clearly marked as such to avoid similar situations.

Original comment by tamaynar...@gmail.com on 3 Jul 2012 at 4:32

GoogleCodeExporter commented 9 years ago
The StackAlloc caused API changes, so it will be documented in the next Bullet 
release.

So your custom btCollisionObject::checkCollideWithOverride is much faster than 
the original one in Bullet? Can you explain why?

Original comment by erwin.coumans on 4 Jul 2012 at 5:31

GoogleCodeExporter commented 9 years ago
Nothing novel I'm afraid, just some simple collision filtering particular to 
the needs of our software, in order to conserve processing power. We found that 
implementing the filtering by sub-classing btRigidBody in our own project, 
setting m_checkCollideWith to 1, and overriding checkCollideWithOverride ended 
up being more flexible in terms of modifying which bodies we wished to filter 
at run-time, and the check seemed to occur earlier in Bullet's collision 
pipeline. The other suggestions in the forum and on the wiki such as masks 
either didn't allow enough filter pairs or didn't seemed to be affected by 
changes while the sim was running. Unfortunately I can't be more specific, It's 
not really anything that could be generalized, just some logic to avoid enough 
unnecessary collisions in order to get an acceptable frame-rate in our 
software, as we are already taxing the constraint system with many bodies and 
joints. Thanks again for your interest and help!

Original comment by tamaynar...@gmail.com on 4 Jul 2012 at 10:20

GoogleCodeExporter commented 9 years ago

Original comment by erwin.coumans on 31 Jul 2012 at 4:47