Closed GoogleCodeExporter closed 9 years ago
Sorry, this should be an enhancement and not a defect.
Original comment by mr.lundm...@gmail.com
on 9 Nov 2010 at 7:36
Great, thanks for wrapping up this patch.
I'll look into it soon.
Original comment by erwin.coumans
on 10 Nov 2010 at 6:14
I looked at the file, but couldn't use svn patch.
Could you try to use svn to create the patch?
Thanks!
Erwin
Original comment by erwin.coumans
on 11 Nov 2010 at 9:16
attempt to turn this patch from GNU diff into svn diff format.
Still needs review. We need to be careful with API changes and sync them with
the SPU version.
Original comment by erwin.coumans
on 7 Dec 2010 at 7:19
Attachments:
We are investigating in the changes necessary in this branch:
http://code.google.com/p/bullet/source/browse/#svn/branches/StackAllocation
Original comment by erwin.coumans
on 14 Dec 2010 at 5:19
Here's the latest patch.
Sorry for the delays Erwin.
Cheers,
Simon
Original comment by mr.lundm...@gmail.com
on 12 Jan 2011 at 1:27
Attachments:
Great, thanks for the patch, I applied the patch in the branches/StackAlloc. It
will be useful in the research for Bullet 3.x
As you know, we are also looking into integrating the new Sony PhysicsEffects2
(in branches/PhysicsEffects), which has been optimized for PS3, PC/SSE and also
PSP2.
Have you considered using vectormath instead of btSimdVector3?
Original comment by erwin.coumans
on 27 Feb 2011 at 7:42
Hey Erwin,
I think that in one of the patches that I mailed you, there was our
implementation of btSimdVector3, which uses simd/sse with an implementation for
ps3/pc. We had implemented it in some parts of bullet in order to get decent
performance.
Unfortunately, I don't work there anymore (started my own company instead), so
I no longer have access to that code.
Original comment by mr.lundm...@gmail.com
on 27 Feb 2011 at 3:19
Congratulations and good luck with starting your own company!
I just wondered if you ever considered using vectormath (which also is
optimized simd for sse and ps3/spu/ppu etc), but it seems you already have
btSimdVector3.
I'll surely consider to integrate your work into Bullet, it won't be all
written from scratch. As a side note, here is an interesting read about writing
code from scratch:
http://www.joelonsoftware.com/articles/fog0000000069.html
Original comment by erwin.coumans
on 28 Feb 2011 at 5:24
It got delayed a lot, but I applied the patch (with some changes) to trunk:
https://code.google.com/p/bullet/source/detail?r=2539
Thanks for your contribution!
Original comment by erwin.coumans
on 10 Jun 2012 at 4:47
Just wanted to say I've integrated the new version of the trunk into our local
copy of Bullet, and we're noticing significant performance degradation, on the
order of a 60% framerate hit. A profile of the same scene without and with this
patch applied:
Old version:
Profiling: Root (total running time: 1.327 ms) ---
0 -- stepSimulation (100.00 %) :: 1.327 ms / frame (1 calls)
1 -- updateActions (0.00 %) :: 0.000 ms / frame (0 calls)
2 -- performDiscreteCollisionDetection (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (0.000 %) :: 0.000 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 1.327 ms) ---
...0 -- internalSingleStepSimulation (96.99 %) :: 1.287 ms / frame (1 calls)
...1 -- synchronizeMotionStates (0.90 %) :: 0.012 ms / frame (1 calls)
...Unaccounted: (2.110 %) :: 0.028 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 1.287 ms) ---
......0 -- updateActivationState (0.08 %) :: 0.001 ms / frame (1 calls)
......1 -- updateActions (8.62 %) :: 0.111 ms / frame (1 calls)
......2 -- integrateTransforms (0.93 %) :: 0.012 ms / frame (1 calls)
......3 -- solveConstraints (36.67 %) :: 0.472 ms / frame (1 calls)
......4 -- calculateSimulationIslands (0.93 %) :: 0.012 ms / frame (1 calls)
......5 -- performDiscreteCollisionDetection (44.60 %) :: 0.574 ms / frame (1
calls)
......6 -- predictUnconstraintMotion (1.48 %) :: 0.019 ms / frame (1 calls)
......Unaccounted: (6.682 %) :: 0.086 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 0.472 ms) ---
.........0 -- solveGroup (93.86 %) :: 0.443 ms / frame (1 calls)
.........1 -- processIslands (1.48 %) :: 0.007 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (3.81 %) :: 0.018 ms / frame (1 calls)
.........Unaccounted: (0.847 %) :: 0.004 ms
............----------------------------------
............Profiling: solveGroup (total running time: 0.443 ms) ---
............0 -- solveGroupCacheFriendlyIterations (79.23 %) :: 0.351 ms /
frame (1 calls)
............1 -- solveGroupCacheFriendlySetup (19.64 %) :: 0.087 ms / frame (1
calls)
............Unaccounted: (1.129 %) :: 0.005 ms
............----------------------------------
............Profiling: processIslands (total running time: 0.007 ms) ---
............0 -- solveGroup (0.00 %) :: 0.000 ms / frame (0 calls)
............Unaccounted: (100.000 %) :: 0.007 ms
...............----------------------------------
...............Profiling: solveGroup (total running time: 0.000 ms) ---
...............0 -- solveGroupCacheFriendlyIterations (0.00 %) :: 0.000 ms /
frame (0 calls)
...............1 -- solveGroupCacheFriendlySetup (0.00 %) :: 0.000 ms / frame
(0 calls)
...............Unaccounted: (0.000 %) :: 0.000 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time:
0.574 ms) ---
.........0 -- dispatchAllCollisionPairs (76.31 %) :: 0.438 ms / frame (1 calls)
.........1 -- calculateOverlappingPairs (0.35 %) :: 0.002 ms / frame (1 calls)
.........2 -- updateAabbs (23.00 %) :: 0.132 ms / frame (1 calls)
.........Unaccounted: (0.348 %) :: 0.002 ms
...----------------------------------
...Profiling: performDiscreteCollisionDetection (total running time: 0.000 ms)
---
...0 -- dispatchAllCollisionPairs (0.00 %) :: 0.000 ms / frame (0 calls)
...1 -- calculateOverlappingPairs (0.00 %) :: 0.000 ms / frame (0 calls)
...2 -- updateAabbs (0.00 %) :: 0.000 ms / frame (0 calls)
...Unaccounted: (0.000 %) :: 0.000 ms
New version:
Profiling: Root (total running time: 2.723 ms) ---
0 -- stepSimulation (99.96 %) :: 2.722 ms / frame (1 calls)
1 -- updateActions (0.00 %) :: 0.000 ms / frame (0 calls)
2 -- performDiscreteCollisionDetection (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (0.037 %) :: 0.001 ms
...----------------------------------
...Profiling: stepSimulation (total running time: 2.722 ms) ---
...0 -- internalSingleStepSimulation (99.16 %) :: 2.699 ms / frame (1 calls)
...1 -- synchronizeMotionStates (0.40 %) :: 0.011 ms / frame (1 calls)
...Unaccounted: (0.441 %) :: 0.012 ms
......----------------------------------
......Profiling: internalSingleStepSimulation (total running time: 2.699 ms) ---
......0 -- updateActivationState (0.04 %) :: 0.001 ms / frame (1 calls)
......1 -- updateActions (8.04 %) :: 0.217 ms / frame (1 calls)
......2 -- integrateTransforms (0.37 %) :: 0.010 ms / frame (1 calls)
......3 -- solveConstraints (15.86 %) :: 0.428 ms / frame (1 calls)
......4 -- calculateSimulationIslands (0.37 %) :: 0.010 ms / frame (1 calls)
......5 -- performDiscreteCollisionDetection (73.03 %) :: 1.971 ms / frame (1
calls)
......6 -- predictUnconstraintMotion (0.59 %) :: 0.016 ms / frame (1 calls)
......Unaccounted: (1.704 %) :: 0.046 ms
.........----------------------------------
.........Profiling: solveConstraints (total running time: 0.428 ms) ---
.........0 -- solveGroup (94.63 %) :: 0.405 ms / frame (1 calls)
.........1 -- processIslands (0.70 %) :: 0.003 ms / frame (1 calls)
.........2 -- islandUnionFindAndQuickSort (3.97 %) :: 0.017 ms / frame (1 calls)
.........Unaccounted: (0.701 %) :: 0.003 ms
............----------------------------------
............Profiling: solveGroup (total running time: 0.405 ms) ---
............0 -- solveGroupCacheFriendlyIterations (85.68 %) :: 0.347 ms /
frame (1 calls)
............1 -- solveGroupCacheFriendlySetup (13.58 %) :: 0.055 ms / frame (1
calls)
............Unaccounted: (0.741 %) :: 0.003 ms
............----------------------------------
............Profiling: processIslands (total running time: 0.003 ms) ---
............0 -- solveGroup (0.00 %) :: 0.000 ms / frame (0 calls)
............Unaccounted: (100.000 %) :: 0.003 ms
...............----------------------------------
...............Profiling: solveGroup (total running time: 0.000 ms) ---
...............0 -- solveGroupCacheFriendlyIterations (0.00 %) :: 0.000 ms /
frame (0 calls)
...............1 -- solveGroupCacheFriendlySetup (0.00 %) :: 0.000 ms / frame
(0 calls)
...............Unaccounted: (0.000 %) :: 0.000 ms
.........----------------------------------
.........Profiling: performDiscreteCollisionDetection (total running time:
1.971 ms) ---
.........0 -- dispatchAllCollisionPairs (95.79 %) :: 1.888 ms / frame (1 calls)
.........1 -- calculateOverlappingPairs (0.10 %) :: 0.002 ms / frame (1 calls)
.........2 -- updateAabbs (4.06 %) :: 0.080 ms / frame (1 calls)
.........Unaccounted: (0.051 %) :: 0.001 ms
...----------------------------------
...Profiling: performDiscreteCollisionDetection (total running time: 0.000 ms)
---
...0 -- dispatchAllCollisionPairs (0.00 %) :: 0.000 ms / frame (0 calls)
...1 -- calculateOverlappingPairs (0.00 %) :: 0.000 ms / frame (0 calls)
...2 -- updateAabbs (0.00 %) :: 0.000 ms / frame (0 calls)
...Unaccounted: (0.000 %) :: 0.000 ms
----------------------------------
As you can see dispatchAllCollisionPairs now takes nearly triple the time to
complete, while all other functions have nearly identical execution time. Any
insight as to the cause of this before I try and delve further in, or worse
roll back the update?
Original comment by tamaynar...@gmail.com
on 28 Jun 2012 at 5:36
That is not good.
Is there any way to reproduce this performance degradation in any of the Bullet
demos?
What kind of collision shape types are you using?
Any other info about the test case that you used to show the difference? Are
both compiled in release mode using the same compiler etc?
Original comment by erwin.coumans
on 28 Jun 2012 at 6:25
We're using btCompoundShape almost exclusively, with the component shapes
consisting mainly of boxes and cylinders with a few spheres. The fact that we
have so many compound shapes is one of the reasons I was excited for this
update :( I'll attempt to reproduce it in a demo and upload the modified demo
and profiling results here, but for now the only other info I have is that I'm
certain the compiler and settings are the same (VS2010 with LTCG enabled),
release mode is certainly targeted.
Original comment by tamaynar...@gmail.com
on 28 Jun 2012 at 6:51
Oh, and several btHeightfieldTerrainShapes for our terrain, in case that is
applicable
Original comment by tamaynar...@gmail.com
on 28 Jun 2012 at 6:52
It would be great if you can provide a reproduction case that shows the
performance degradation. Given such repro case I'll make sure it gets fixed.
Original comment by erwin.coumans
on 28 Jun 2012 at 6:57
It appears that the majority of the performance loss was due to some changes to
the API to make Bullet more const-correct. We're using some elaborate collision
filtering by overriding btCollisionObject::checkCollideWithOverride, and as the
argument type for that method was changed to a const pointer, the overridden
implementation was never called. There is still some slight performance loss,
but less than a third of what I was experiencing before. If the remaining
problem is related to the core Bullet lib I'll post details.
P.S. In the future, it may be helpful if any commits which introduce API
changes are clearly marked as such to avoid similar situations.
Original comment by tamaynar...@gmail.com
on 3 Jul 2012 at 4:32
The StackAlloc caused API changes, so it will be documented in the next Bullet
release.
So your custom btCollisionObject::checkCollideWithOverride is much faster than
the original one in Bullet? Can you explain why?
Original comment by erwin.coumans
on 4 Jul 2012 at 5:31
Nothing novel I'm afraid, just some simple collision filtering particular to
the needs of our software, in order to conserve processing power. We found that
implementing the filtering by sub-classing btRigidBody in our own project,
setting m_checkCollideWith to 1, and overriding checkCollideWithOverride ended
up being more flexible in terms of modifying which bodies we wished to filter
at run-time, and the check seemed to occur earlier in Bullet's collision
pipeline. The other suggestions in the forum and on the wiki such as masks
either didn't allow enough filter pairs or didn't seemed to be affected by
changes while the sim was running. Unfortunately I can't be more specific, It's
not really anything that could be generalized, just some logic to avoid enough
unnecessary collisions in order to get an acceptable frame-rate in our
software, as we are already taxing the constraint system with many bodies and
joints. Thanks again for your interest and help!
Original comment by tamaynar...@gmail.com
on 4 Jul 2012 at 10:20
Original comment by erwin.coumans
on 31 Jul 2012 at 4:47
Original issue reported on code.google.com by
mr.lundm...@gmail.com
on 9 Nov 2010 at 7:36Attachments: