Processing Hung freezes sim

JNeiger commented 6 years ago

Processing hung is caused by something taking longer than one frame to execute. If it takes much much longer than one frame, it will actually freeze the sim.

This specific bug seems to only happen when you start a play or part of the way through the play, during a state transition. It freezes the sim for at least 30 seconds, I haven't seen it unfreeze if this happens, but I haven't tested it for prolonged periods.

I think I have seen it happen with Adaptive Formation. Specifically when we collect the ball and begin to search for a good pass position. If others confirm this, it is most likely due to the Nelder-Mead not ending after a specific number of iterations like it's supposed to.

Post what play it happened in and when it happened and we can try to narrow it down.

CollinAvidano commented 6 years ago

So as Joe and I have found out this is an issue with adaptive formation and other defensive plays rapidly changing which robots are submissive defenders. Changing Robot_Change_Cost in defense.py though has varying effects on alleviating this.

kylestach commented 5 years ago

I can confirm that this shows up on my machine when running adaptive formation. Only happens every ~10-15 runs of soccer on average, but it definitely still exists.

A bit of debugging shows that the issue is on https://github.com/RoboJackets/robocup-software/blob/master/soccer/planning/Path.cpp#L79. Specifically, the planner seems to throw in a waypoint with a REALLY big time. From GDB:

(gdb) p this->waypoints
$4 = std::vector of length 81, capacity 81 = {{instant = {pos = {_x = -0.99999993443395974, _y = 3.0000237981465219}, vel = {_x = -7.2386387164475996e-22, _y = 1.1039962766313293e-14}}, time = {__r = 0}, angle = {<boost::optional_detail::optional_base<Planning::AngleInstant>> = {<boost::optional_detail::optional_tag> = {<No data fields>}, m_initialized = false, m_storage = {dummy_ = {data = {31 '\037', 57 '9', 10 '\n', 64 '@', -28 '\344', -7 '\371', -26 '\346', -22 '\352', -1 '\377', -1 '\377', -17 '\357', -65 '\277', 43 '+', -32 '\340', -37 '\333', -116 '\214'}, aligner_ = {<No data fields>}}}}, <No data fields>}},
............................(omitted)...........................,
{instant = {pos = {_x = 0.14014906313103487, _y = 4.4015925837430636}, vel = {_x = 0, _y = 0}}, time = {__r = 167690317.18500748}, angle = {<boost::optional_detail::optional_base<Planning::AngleInstant>> = {<boost::optional_detail::optional_tag> = {<No data fields>}, m_initialized = false, m_storage = {dummy_ = {data = {0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000', 0 '\000'}, aligner_ = {<No data fields>}}}}, <No data fields>}}}

Obviously, that's going to cause that loop to effectively turn into a busy wait (it'll iterate roughly 3.3 billion times, unless machine precision breaks down at numbers that large in which case it'll just never complete).

Update: I think it may have something to do with this line: https://github.com/RoboJackets/robocup-software/blob/master/soccer/planning/RRTPlanner.cpp#L721 - I think that's the only thing that is actually able to generate trajectory waypoints anywhere.

JNeiger commented 5 years ago

I think it's one up from that. The control points it gets as an arg are trash halfway through.

 _y = -2.9943037259357452}, {_x = 35.691971412439756, _y = -544353780.8313601}, {_x = 32.342421127043899, _y = -490939065.89037621}, 
  {_x = 29.11888371342453, _y = -439565677.63760769}, {_x = 26.021359171581629, _y = -390233616.07305461}, {_x = 23.049847501515199, 
    _y = -342942881.19671685}, {_x = 20.20434870322525, _y = -297693473.00859439}, {_x = 17.484862776711768, _y = -254485391.50868735},

This is one frame up. It's failing somewhere in the generateCubicBezier function. The points going in are fine

#5  0x00007ffff7a52d00 in Planning::RRTPlanner::generateCubicBezier (points=std::vector of length 3, capacity 4 = {...}, obstacles=..., 
    motionConstraints=..., vi=..., vf=...) at ../soccer/planning/RRTPlanner.cpp:756

Edit: ks goes to inf. https://github.com/RoboJackets/robocup-software/blob/master/soccer/planning/RRTPlanner.cpp#L506

(gdb) p ks
$13 = std::vector of length 2, capacity 2 = {67262311.970732063, 0.3515580143065889}

Edit2: The first two points of the path are the exact same. ~~This may relate to #1184~~

Edit3: It's a BiRRT problem in getPath. https://github.com/RoboJackets/rrt/blob/2cc0b1656f69d300aadd39b208713894e92e96f3/src/rrt/BiRRT.hpp#L101

~~I'm thinking that the two tree's have a common element where they hit each other. If you don't remove that element, then you will get a duplicate which causes this cascade.~~

JNeiger commented 5 years ago

I think I got it narrowed down somewhat? There are a few things that have to happen for it to die for it to happen in this specific case.

The first is that the BiRRT has to be able to directly step on top of the goal node. This happens when there is a straight line from the start to goal. This constraint could be loosened slightly, but this is the most common case.
The BiRRT must miss that this step is impossible and build a path using {p0, p0, p1} in the simple case. This is because depending on the direction, the transitionValid is different.

(gdb) p points
$7 = std::vector of length 3, capacity 4 = {{_x = -1.0029004033453848, _y = 3.0212993269876827}, {_x = -1.0029004033453848, 
    _y = 3.0212993269876827}, {_x = -0.099359295655502944, _y = 4.3524531358445335}}
(gdb) p biRRT.getPath()
$8 = std::vector of length 3, capacity 4 = {{_x = -1.0029004033453848, _y = 3.0212993269876827}, {_x = -1.0029004033453848, 
    _y = 3.0212993269876827}, {_x = -0.099359295655502944, _y = 4.3524531358445335}}
(gdb) p stateSpace->transitionValid(points.at(0), points.at(1))
$9 = true
(gdb) p stateSpace->transitionValid(points.at(1), points.at(2))
$10 = false

(gdb) p points
$16 = std::vector of length 3, capacity 4 = {{_x = -0.99999993443395974, _y = 3.0000237981465219}, {_x = -0.99999993443395974, 
    _y = 3.0000237981465219}, {_x = -0.099235793212436613, _y = 4.6437347787636014}}
(gdb) p biRRT._goalTree._stateSpace->transitionValid(biRRT._goalTree._nodes.at(0)._state, biRRT._goalTree._nodes.at(1)._state)
$17 = true
(gdb) p biRRT._goalTree._stateSpace->transitionValid(biRRT._goalTree._nodes.at(1)._state, biRRT._goalTree._nodes.at(0)._state)
$20 = false
(gdb) p biRRT._goalTree._nodes
$18 = std::deque with 2 elements = {{_state = {_x = -0.099235793212436613, _y = 4.6437347787636014}, 
    _vec = std::vector of length 2, capacity 2 = {-0.099235793212436613, 4.6437347787636014}, _children = std::__cxx11::list = {
      [0] = 0x7fffac1d5d88}, _parent = 0x0}, {_state = {_x = -0.99999993443395974, _y = 3.0000237981465219}, 
    _vec = std::vector of length 2, capacity 2 = {-0.99999993443395974, 3.0000237981465219}, _children = empty std::__cxx11::list, 
    _parent = 0x7fffac1d5d40}}
(gdb) p points
$19 = std::vector of length 3, capacity 4 = {{_x = -0.99999993443395974, _y = 3.0000237981465219}, {_x = -0.99999993443395974, 
    _y = 3.0000237981465219}, {_x = -0.099235793212436613, _y = 4.6437347787636014}}
(gdb)

Since this step is impossible, we cannot smooth the path anymore. In the 3 point case, it can only smooth it if the transition from the p(0) -> p(2) is valid and it's not in this case.
This gets passed back up and we use this set of waypoints
We then try to build a acceleration limited path
In this process we need to get the delta time between two waypoints. Since they are the same, the time is almost zero, but not quite.
This then blows up our path solution
Then that path intersect thing is called

The transitionValid is different most likely because we specifically decide to allow the transition out of obsticles, but not into them. This messes up the goal tree which starts at the end and works backwards. It allows the goal tree to grow (where the path moves into an obstacle). The only check is then for the startNode -> goalGrowthNode which is the exact same point. I think inverting the transition valid check for the goal tree might solve the problem. There's a problem with Adaptive trying to move all the robots to the same location as well

kylestach commented 5 years ago

@JNeiger I'm going to see if I can write a test case for the RRTPlanner that consistently generates this issue. ~Do we have a framework/setup for writing unit tests for soccer internals?~ Never mind, figured it out.

RoboJackets / robocup-software

Processing Hung freezes sim #1208