Agents disappear / get lost in long simulations

janx8r commented 2 years ago

Hello, I use Menge to simulate humans in the environment of mobile robots. To do this, agents walk to a random target, wait there for a random time and then walk to the next random target. Unfortunately, in longer simulations (duration 1,000s - 10,000s) agents disappear from the simulation, that is, they are no longer visible and their position is NaN. There is no error message and there is no further information about it in the log file. My simulation uses a navigation mesh and contains obstacles. Depending on which agent model I use, the agents get lost more or less often. Very often they get lost with "zanlungo", less often with "johannson". It is noticeable that often two agents disappear at the same time, but sometimes only one agent at a time. I suspect that the agents disappear when they collide with each other or collide with an obstacle, but can't find anything in the code about this.

Attached are images of my world and the navigation mesh, my xml files, the navigation mesh and my log file.

Does anyone have an idea what my problem is?

Jan menge-world menge-mesh files.zip

MengeCrowdSim commented 2 years ago

Thanks for the description and the files. I'll look into this.

MengeCrowdSim commented 2 years ago

I note that you've only mentioned force-based models. I also note that you're using quite a large time step (<Common time_step="0.128"> inworldS.xml`). Force based models do not behave well when agents get very close. To get a more stable simulation, you need to significantly reduce the time step.

I ran your simulation using a fixed random seed of zero. At time 39.04 s, there is an interaction between agent 10 and agent 12 where the force generated between the two agents goes to infinity (technically, it doesn't mathematically go to infinity, but numerically. The calculation overflows the 32-bit float and becomes infinity.) However, by reducing the time step to 0.032, there were no infinite forces calculated.

Alternatively, you can simply increase the number of minor sub steps (passing --subSteps 3 on the command line has the same effect on the integration as cutting the time step by a factor of 4, but outputs like visualization and trajectory files will still report at the time step period.

I wouldn't say this is the definitive answer. And there's some code in the Zanlungo agent that attempts to prevent infinite forces, but it fails because the clamping of force magnitude doesn't happen before the full magnitude is computed. So, a small hack to the code would address that (I'll submit a specific issue).

But a definite take away should be: don't use large time steps (e.g., 0.1s) with force-based models. It invites explosions. They generally require an order-of-magnitude smaller timestep.

janx8r commented 2 years ago

Thanks for the quick and helpful reply. I have now understood the problem. However, I cannot reproduce the behavior at t=39.04s with a random seed r=0. According to the command line parameter description, the seed r=0 is replaced by a system time dependent seed. But even with a valid seed r=1, I notice that the disappearance of the agents happens at different times in different simulation runs.

Reducing the time step to 32ms or using substeps makes the disappearance much less frequent, but does not fix it completely. In a test with 128ms time step and 3 substeps, only two of the original 13 agents were still there after 80k seconds. For my application, I need to do many very long simulations in significantly faster than real time (and reducing the time step makes the simulation slower, of course), so I'm interested in solving the problem.

The change you suggested in zanlungoAgent.cpp line 164ff seems to solve the problem for the Zanlungo agent completely. In a test, after 100k seconds, all agents are still there.

zanlungoAgent.cpp line 164ff:

magnitude = magnitude * expf(-dist / D);
if (magnitude >= MAX_FORCE) magnitude = MAX_FORCE;
// float magnitude = weight * Simulator::AGENT_SCALE * abs( myVel - hisVel ) / T_i;
// 3. Compute the force
return D_ij * magnitude;

For the Johansson agent, the magnitude limit is completely missing. After adding it, the problem is not completely solved yet.

johanssonAgent.cpp line 96ff:

// Force direction
Vector2 forceDir = 0.5f * (relDir + (relPosOffset / relPosOffsetDist));
//Limit the magnitude
const float MAX_FORCE = 1e15f;
if (magnitude >= MAX_FORCE) magnitude = MAX_FORCE;
force += magnitude * forceDir;

The actual problem of the Johannson agent is that when calculating b the square root is taken from a negative number (very rarely, for me about 1x in 50k seconds) . A small code adjustment can fix the effect, but this only works in combination with the magnitude limitation shown above. johanssonAgent.cpp line 91:

float b = 0.5f * sqrtf(term1 * term1 - offsetDistSq);
if(b!=b) b = 0.0f;

With this modification, no more agents disappear in a 180k seconds simulation using johansson agent.

To make sure that with other models no more agents disappear in long simulations, I will test all models and try to fix the problems if necessary.

chraibi commented 2 years ago

@seancUNC you should consider implementing a simple and fast velocity-based model in addition to these force-based models, where such artificial hacks are not nice.

@janx8r you are solving a second-order ODE with the very simple Euler scheme. So you simply can not use a bigger integration time step as you would like to. Here is an interesting read.

Out of curiosity, can you explain the goal of your simulation? As I understood your first comment, you want to simulate a random walk of N agents in a square room for a very long time. Right?

MengeCrowdSim commented 2 years ago

@janx8r Those fixes are exactly what you want. :)

You never answered the question about why you're using a force-based model? As @chraibi points out, they are second order ODEs and the only integrator is a simple, first-order explicit Euler scheme which requires small steps. Did you consider using orca? It' s a first-order ODE and is much more generous in time step size?

@chraibi In practice, unless we have an error-correcting integrator, those force-based models require some kind of "hack". The hack may simply be detection of invalid state and immediate halt to simulator (which, at the very least, would stop the problem of continuing simulation of NaN agents). But from a purely engineering perspective, it's not overly helpful.

Orca (and, by inheritance, pedvo) has a maximum acceleration. The force-based models should get the same. It acts as a low-pass filter. It would prevent most of the worst aspects of the instability in dense crowds (crazy jittering) at the cost of making the agents less responsive to imminent collision. Given that the pedestrians being simulated likewise have real maximum accelerations, it's reasonably physically motivated. Clamping each individual force is a coarse approximation of limiting the resultant acceleration.

janx8r commented 2 years ago

My goal is to study the interaction and mutual influence of robots and humans. I simulate the robots with the "Webots" simulator. The motion of the humans is simulated with Menge. I currently implement a Menge plugin which is an interface between these two simulators. This makes it possible to use Menge agents to move 3D models of humans in the robot simulator and the other way around to move robot agents in Menge as they move in the robot simulator. Robots and humans both have a Menge agent, so that humans perceive the robot and for example avoid it. The robot agent is not controlled by Menge, but by the robot simulator. So humans also perceive robots and react to their motions.

The environment shown above is just a small test environment. My plan is to have the environment generated automatically. There are generators that have learned from blueprints of buildings to create random new blueprints. Of course, there is some work behind this, such as the automatic generation of the navigation mesh (I have already implemented this and will make it available to the community soon).

@chraibi Thanks for the interesting link! I realize that I can not increase the step size arbitrarily. I don't want that either, 32ms is also feasible for me. By my comment I meant that I can't reduce the step size to the point where the problems described above no longer occur.

@MengeCrowdSim To be honest I don't know much about the different agent models. I can select them in Menge and see different behavior of agents in simulation. Some models are more suitable in my environment than others.For example karamouzas, gcf and pedvo block each other very often in bottlenecks in the environment shown above (then the agents just stop and maybe still wiggle back and forth). With orca and pedvo I have the problem that the agents sometimes jump into obstacles and get stuck there. Probably this has to do with the rather large time step. If I reduce the time step, however, it comes more often to a mutual blocking in bottlenecks. With the fixes described above, johannson, zanlungo and helbing now work well, even if I don't "like" their other behavior as much as certain aspects of orca and gcf, for example. At the end of the day, selecting and configuring the models so that they work well in environments like the one shown above is still a lot of work and needs experience (which I don't have yet).

MengeCrowdSim commented 2 years ago

@janx8r thanks for the feedback. The issues you describe with the velocity-based models is good to know. I did observe a couple instances of walking through walls with the force-based. That's expected based on time step. The fact that the velocity-based does that is very surprising -- I would've thought impossible. I'll have to investigate that.

MengeCrowdSim / Menge

Agents disappear / get lost in long simulations #159