dartsim / dart

DART: Dynamic Animation and Robotics Toolkit
http://dartsim.github.io/
BSD 2-Clause "Simplified" License
906 stars 286 forks source link

Collisions in apps with Multiple Worlds #410

Open jturner65 opened 9 years ago

jturner65 commented 9 years ago

In my application I have multiple, duplicate worlds instanced from the same .skel file, which is based on fullbody1.skel.

I consume these worlds in a multi-threaded (using ) application, where the number of threads is governed by the number supportable by the processor (via thread::hardware_concurrency()) and the worlds are evenly distributed among each thread (usually 10 worlds per thread).

The skeletons are defined to use Velocity joints as per the skeleton in the hybridDynamics application.

If I set the skeletons to the same initial state and apply the same control velocities I will see both skeletons resolve to the same end state, but their ground contact profiles will be different, both the location and the magnitude of the forces.

Furthermore, I usually will see only one contact point per rigid body exhibiting non-zero contact force (multiple other contacts may exist but they will have 0 force).

While I have seen this behavior resulting from execution in a multi-threaded environment, if I lock_guard the individual worlds' step() function, I still see the behavior, so while multi-threading may exacerbate the issue, it isn't the cause as far as I can tell.

mxgrey commented 9 years ago

Was this running in Windows? If so, I'm thinking we should wait to see if all the Eigen alignment issues are resolved first. Then if this issue still persists, we can address it.

Furthermore, I usually will see only one contact point per rigid body exhibiting non-zero contact force (multiple other contacts may exist but they will have 0 force).

If you are using primitive collision geometries, then this could be related to a known FCL issue that @jslee02 is working out with the OSRF people. To the best of my understanding, the issue is that primitive collisions in the mainline version of FCL only return a single contact point at a time, even if there should be multiple.

jturner65 commented 9 years ago

yes, this was in windows, and it is still present after i made my changes to address the issues i reported. i will of course check again to see if it is still present after the posted fixes have been installed.

i am using primitive collisions, but it is with the dart collision detection, and not the fcl one (i was also seeing it when i was using fcl collision detection, which motivated the change to the dart detector).

mxgrey commented 9 years ago

Just to be clear, do you know if this happens when you're using traditional force control instead of velocity control?

mxgrey commented 9 years ago

I've hijacked the bipedStand app to write a test that I believe recreates the scenario you're talking about. You can find it in the grey/testCollisionConsistency branch. If you run the bipedStand app, it will clone (as well as load from file) a bunch of the fullbody1.skel worlds and then simulate them forward while applying velocity control to the joints, in the same style as the hybridDynamics app.

As the app runs, it will print out information about how often there is any kind of inconsistency between the collisions and forces of the original world and any of the other worlds.

Running on 64-bit Ubuntu 14.04 compiled with GCC, I saw zero inconsistencies between the collision data of any of the worlds. If you could run the bipedStand app from my branch on your Windows machine and let me know what the results are, I would greatly appreciate it.

Right now, the app is only running single-threaded. If you don't run into any consistencies with this single-threaded test, then I'll see about putting together a multi-threaded version of it to see whether that's the culprit.

As a final note, if you want to visualize the forward simulation instead of running the test, then look for the const bool runTest variable at the top of bipedStand/Main.cpp and switch it false.

jturner65 commented 9 years ago

one difference i see off the bat is that i am using setCommands to set the joint velocities ( i am deriving the control from a separate algorithm ) and i am initializing the skeletons to be the same using getState(which gets pos and vel) and setState. do i need to copy accels and forces too?

jturner65 commented 9 years ago

also, i use a large timestep (part of the algorithm) - 1/60 is the smallest i use.

mxgrey commented 9 years ago

In my test, I'm trying to boil everything down to simple forward simulation with no feedback control or any other confounding factors. If you find that this simple test has inconsistent results on your system, then something about the Windows platform is introducing non-determinism into the forward simulation. On the other hand, if the test works as it's supposed to and everything within the test proves to be consistent, then it'll be worth trying to figure out what other factors might result in the inconsistencies that you're seeing.

i am initializing the skeletons to be the same using getState(which gets pos and vel) and setState. do i need to copy accels and forces too?

I would expect this to typically be enough for your use case, because joint acceleration is computed as a part of forward dynamics, and joint force is computed automatically by the velocity control mode. We have plans for release 5.1 that will introduce a proper State class which will aim to guarantee a comprehensive description of a Skeleton state, but that isn't fully fleshed out yet.

also, i use a large timestep (part of the algorithm) - 1/60 is the smallest i use.

In principle this shouldn't result in inconsistencies between Worlds, as long as every World is using the same time step size. There are only four things that should result in numerically different simulation results: (1) Different Skeleton properties, (2) Different starting conditions, (3) Different control inputs, and (4) Different simulation parameters. If all of those match up, then simulation results should be numerically identical. If they all match up but simulation results are not identical, then something somewhere is introducing non-deterministic numbers which is not okay.

jturner65 commented 9 years ago

ok, cool, thanks for taking the time to explain, this is as i thought.

I run the test with no issues.

mxgrey commented 9 years ago

Alright. I think the next thing I'll try is to add some multi-threading to the test to see if that's a possible factor. Do you know if VS2013 supports the C++11 std::thread?

jturner65 commented 9 years ago

yep

https://msdn.microsoft.com/en-us/library/hh920526.aspx

mxgrey commented 9 years ago

I've added a multithreaded version of the test (same branch and same app name). If the multithreading is working as it's supposed to, the output for the multi-threaded version should be identical to the single-threaded version (except maybe for the rate of output).

You can switch between the single-threaded and multi-threaded version by changing the multiThreaded boolean at the top of the .cpp.

When you get a chance, try out the app and let me know what you get. If the results from this are okay, then I'll see about adding a closed loop controller that operates on velocity commands.

jturner65 commented 9 years ago

i did get a message : Warning collision function between node type 5 and node type 5 not supported.

other than that, nothing.( everything seemed to work fine, that is.)

mxgrey commented 9 years ago

Warning collision function between node type 5 and node type 5 not supported.

Now that's interesting. Can you tell whether this is being printed out by DART or by FCL? I would guess FCL, because I'm not finding a string resembling that inside of DART. Does it show only show up when you run the multi-threaded version? How long does it take before printing out? I've never gotten a warning like that from the test, so this might indicate a crucial difference between the platforms.

In all of my test runs, the results of the multi-threaded test are identical to the results of the single-threaded test. They both simply print out this line repeatedly:

inconsistent force count: 0 single force count: 109

Other than the warning, does that match what it prints out for you?

jturner65 commented 9 years ago
  1. i can't find it either in DART, so it must be FCL, although i don't know. in my application i am not using FCL, although FCLmesh loads as a default by dart and is then overwritten with DART collision type.
  2. it only showed up in multi-threaded,not single threaded. also, only in release mode, not debug mode.
    message : Warning: collision function between node type Warning: collision function between node type 5 and node type 5 is not supported 5 and node type 5 is not supported
  3. it was the first thing that printed out.
  4. other than that, my results match yours (inconsistent force count 0: single force count :109)
mxgrey commented 9 years ago

Well, I guess that's at least one thing in the "platform differences" column. Although realistically, it might just have to do with differences in our versions of FCL.

I'll put together a test that has some feedback control in it to see if that results in any major differences.

jturner65 commented 9 years ago

i did have to compile my own version of FCL to get it to work on windows, but again, i am not using FCL at all in my app (i specify dart as collision type in the skeleton file). if FCL is being used that would be wrong, right? (in my app, not in the test - understand it is defaulting to FCL in the test).

if you think it might help, i can share with you my app - you can install it over dart as a submodule in apps and it will work (totally self contained). i have tests built in that verify the collision strangeness that i am seeing.

mxgrey commented 9 years ago

Seeing your app would probably help to diagnose what you're experiencing.

jturner65 commented 9 years ago

ok, i sent you an invite from bitbucket.

mxgrey commented 9 years ago

I got the app compiled (there were some cross-platform issues, but nothing serious), and I think I see what you were describing. It looks like there is only one collision force vector being produced in a given timestep, even though there may be multiple points of collision. I'm assuming that black lines represent collision forces while the other colored lines (I think I saw blue, yellow, and red) are something else.

There's a lot going on in the app, so it'll probably take me some time to digest all everything that's happening. If you have a readme or an overview of some sort that you can point me at, that might help a bit.

One thing that I noticed is the black force vector always seems to come out of the large teal ball, and never out of the smaller blue balls. Is it possible that something in your app is averaging the collision data together into that one lump, making it appear as though there's only one force?

jturner65 commented 9 years ago

ok, great. the black line connects the COM to the COP (teal ball) that i calculate via the contacts at the end of each application of control - this is just a visualization aid i have to assist me in finding the COP. the red line is the direction and relative magnitude of the COM velocity.

to see the collision forces, look at the blue lines - they usually extend below the ground from the blue balls, which are contact locations. these are DART-provided, and displayed in Win3d.cpp (iirc) or simwindow.cpp.

i have built in a test that calculates control vectors using the algorithm, applies them to a skeleton during the algorithm's application (so during multi-threaded application), records state before application and after and also records contact profile. then it takes those same forces and pre-application states and applies them to the other skeletons outside of the execution of the algorithm, in single threaded mode. these test are managed in MyWindow.cpp , in testSimCntxtDet().

the first phase of this test applies the same start state and control 5 times to the same world in sequence, and compares the results, and then 2nd phase of the test applies the same start state and control to all worlds (to see if there is something odd in a certain set of worlds).

to see this, immediately after launching the app (before hitting start sim) hit test det (for "test determinism) and then start sim. i print out to the console the previous states of both worlds (i call them contexts, as in physics contexts) the controls applied and the final states of both contexts after the controls are applied, as well as the difference in the states after the application of control (as a scalar value - the norm of the difference of the two state vectors). the state vectors are the results of skel->getstate().

all of this information will be a wall of text, but the good news is it should all be as expected (i.e. context 0, which is the world where all the reference information is generated during the simulation should match context 1 and up, which are the test worlds).

the collision profiles are where things are different, and these are when there are contacts to display. you'll see something that looks like this in the console window : contacts at step 2 test iteration 1 context 0 contact : .... context 1 contact : .... ...... here you should see the differences.

Note, the first step there are not contacts, because the skeletons are not yet in contact with the ground.

jturner65 commented 9 years ago

with regard to the teal ball (COP location) and the black line :
i calculate this by taking the forces of all the contacts, summing their moments about a single point in the ground plane, and finding the point where the sum of the forces would need to be applied to yield the summed moment. while i do not currently use this for anything in my algorithm (was intending to use it to maintain balance in the cost function) this does serve as a quick visual reference to what is going on in the simulation. when the COP sphere aligns with a single contact point, that point is the only contact the skeleton has with the ground with non-zero force.

jturner65 commented 9 years ago

anything else i can do to help this out? just let me know! :)

mxgrey commented 9 years ago

Sorry, I've been occupied with other work lately.

I've seen the debugging info that you showed, and I see what you mean where the forces of all but one contact point are mysteriously zero. It's hard to think of a reasonable explanation for this.

Did this issue not happen in an earlier version of DART? If so, do you remember the latest version where this issue didn't happen?

It would probably be best if we can create a minimalist replication of the issue. There's a lot going on in your app, so it'll be difficult (at least for me) to debug the issue in that environment.

jturner65 commented 9 years ago

hey, no sweat- thanks for taking the time to actually get it running and check it out.

i've always seen this since i started this app (which was the beginning of march) so in dart 4 and now in 5.

i agree with the minimalist replication - i only wanted to show you this so that a) you at least had something that was exhibiting the problem and b) you would understand where i was coming from with the issue.

i think the next step would be to expand the test app you made to handle control applications.

jturner65 commented 9 years ago

i found an easy way to reproduce part of this issue (the 0 force contacts). in the "stock" hybrid dynamics app, if you set the timestep in the world to be 1/30, and then modify the timeStepping function to be the following :

  dart::dynamics::SkeletonPtr skel  = mWorld->getSkeleton(1);

  size_t index0 = skel->getJoint("j_scapula_left")->getIndexInSkeleton(0);
  size_t index1 = skel->getJoint("j_scapula_right")->getIndexInSkeleton(0);
  size_t index2 = skel->getJoint("j_forearm_left")->getIndexInSkeleton(0);
  size_t index3 = skel->getJoint("j_forearm_right")->getIndexInSkeleton(0);

  size_t index6 = skel->getJoint("j_shin_left")->getIndexInSkeleton(0);
  size_t index7 = skel->getJoint("j_shin_right")->getIndexInSkeleton(0);

  skel->setCommand(index0,  1.0 * std::sin(mWorld->getTime() * 4.0));
  skel->setCommand(index1, -1.0 * std::cos(mWorld->getTime() * 4.0));
  skel->setCommand(index2,  0.8 * std::sin(mWorld->getTime() * 4.0));
  skel->setCommand(index3,  0.8 * std::cos(mWorld->getTime() * 4.0));

  skel->setCommand(index6,  0.1 * std::sin(mWorld->getTime() * 2.0));
  skel->setCommand(index7,  0.1 * std::sin(mWorld->getTime() * 2.0));

  mWorld->step();

which serves to give the skeleton assymectrical motion, the contact profile ends up having 0 force contact points at times.

mxgrey commented 9 years ago

Thanks, this should be helpful. I'll check it out as soon as I've finished what I'm currently working on.

karenliu commented 9 years ago

I made those changes but I still got multiple non-zero contact forces.

jturner65 commented 9 years ago

here is a link to a clip of my modified hybridDynamics code (assymetrical control applied to shoulders, large timestep) illustrating the 0 contacts.

https://dl.dropboxusercontent.com/u/55351229/hybridDynContacts.mp4

i can provide the code behind this clip if needed.

also, please remember that there are two issues i am seeing on windows that may or may not be related - the zero-force contacts, and the non-reproducibility of the contact force profile given the same state and force application.

jturner65 commented 9 years ago

Karen are you setting the time step to 1/30, after the world constructor is run? (it is hardcoded to be .001 in the constructor)? just curious - that might be why you're not seeing the contact behavior.

jturner65 commented 9 years ago

i have addressed the issue i was seeing with hybrid dynamics app by making the world ground plane thicker, after reading issue #426 (since i'm using a large timestep, it is equivalent to a very thin ground plane). i am currently seeing if this addresses the issue i was having with my project, and if so i will close this issue.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.