eclipse-sumo / sumo

Eclipse SUMO is an open source, highly portable, microscopic and continuous traffic simulation package designed to handle large networks. It allows for intermodal simulation including pedestrians and comes with a large set of tools for scenario creation.
https://eclipse.dev/sumo
Eclipse Public License 2.0
2.58k stars 1.44k forks source link

Out-of-lane vehicle using moveToXY cause sumo-gui freeze when not calling simulationStep continuously #10974

Open jjyyxx opened 2 years ago

jjyyxx commented 2 years ago

We use traci to remote control an ego vehicle (doing some RL stuff). The execution flow is roughly

while True:
    # decision making and forward our vehicle dynamics model
    # ...
    traci.vehicle.moveToXY("id", '-1', 0, x, y, phi, keepRoute=0b011)
    traci.simulationStep()

Sometimes it's necessary to pause at a Python breakpoint and evaluate some variables. However, we find that once the ego vehicle is out of lane (not sure if it's the precise expression), then pauses Python execution, i.e., not calling simulationStep continuously, the GUI will freeze. An extreme case of "out of lane" is moving out of the network, but for us, a more typical case would be controlling inside the junction and the vehicle does not belong to any internal lane.

Halting the simulation and single-step in GUI workaround the problem, but this complicates our debugging. I'm not sure if it's the intended behavior. Or did we make something wrong?

Any scenario shall work.

SUMO-version: 1.13.0

operating system: Windows 10 / Ubuntu 20.04.4

namdre commented 2 years ago

Does the problem also occur if you run 'sumo' instead of 'sumo-gui' ?

jjyyxx commented 2 years ago

Thanks for your quick reply! But the sumo binary does not have direct ways for user interactions (even if it freezes, I will not notice). To clarify, not the simulation but the GUI freezes. I can call simulationStep or any other traci API as normal, but I cannot zoom in/out or press other buttons (e.g., halt the simulation) or read coordinates under my pointer. When the ego vehicle is on the lane, UI works as expected.

To add up, when freezing, the GUI will not update/repaint, unless I resume Python execution (and several dozens of simulation steps pass; this time span was not thoroughly validated).

namdre commented 2 years ago

Vehicles being out of lane should no longer occur (or at least not due to bug #10952)

jjyyxx commented 2 years ago

Vehicles being out of lane should no longer occur (or at least not due to bug #10952)

With SUMO v1_13_0+1024-fd2ab534e53, my simple test case (similar to https://github.com/eclipse/sumo/issues/10952, appending an infinite loop at the end) no longer causes SUMO freeze. But in our real program, the problem still occurs and seems to happen at the left-turn in the junction. I will try to create an example.

I was expecting that as long as my remote-controlled vehicle remains inside the road network (the simulation will reset as soon as the vehicle is out), at least in simple test cases like a single crossroad, the GUI should not freeze. Is this expectation intended behavior?

jjyyxx commented 2 years ago

Test case for 1.13.0 and v1_13_0+1024-fd2ab534e53 The scenario files (Same as https://github.com/eclipse/sumo/issues/10952) are available at https://gist.github.com/jjyyxx/779e4245b63551f75b6a730138a5ab83#file-test-net-xml.

import traci

def main():
    sumo_args = [
        "sumo-gui",
        "--configuration-file", "./test.sumocfg",
        "--step-length", "0.1",
        "--seed", "1"
    ]
    traci.start(sumo_args)

    traci.simulationStep(10.0)
    move_id = "carflow_-9_3.1"
    traci.vehicle.setColor(move_id, (255, 0, 255, 255))
    traci.simulationStep(10.0 + 2.0)

    # Failing cases for both 1.13.0 and v1_13_0+1024-fd2ab534e53
    # Edge outside current route
    move_coord = (88.5, 103.5, 170.0)
    move_coord = (98.5, 117.0, 170.0)
    move_coord = (86.0, 117.0, 170.0)
    move_coord = (77.5, 114.5, 170.0)
    move_coord = (78.5, 113.5, 120.0)
    move_coord = (90.0, 138.0, 170.0)
    move_coord = (35.5, 104.5, 170.0)

    # Failing cases for both 1.13.0, but passing for v1_13_0+1024-fd2ab534e53
    # Edge within route, internal edge
    move_coord = (112.0, 118.0, 120.0)
    move_coord = (120.0, 112.0, 120.0)

    # Passing cases for both 1.13.0 and v1_13_0+1024-fd2ab534e53
    # Edge within route, non-internal edge
    move_coord = (81.0, 183.0, 170.0)

    traci.vehicle.moveToXY(move_id, '-1', 0, *move_coord, keepRoute=0b011)
    traci.simulationStep()

    while True: pass

main()

In summary, the common pattern seems to be: if the edge after moveToXY is not in the vehicle's route (including internal edges), the GUI will freeze. Outside the junction, this is generally not an issue for us, since this indicates completely wrong control and the environment will be done. But inside the junction, this may frequently happen (during debugging).

PS: I'm unsure if bit0 of keepRoute matters in this case.

Ahmad1441 commented 1 year ago

Yes, I am facing the same issue in SUMO 1.15. Can you please fix this issue. Thanks,

Minokori commented 2 months ago

I've met same problem on SUMO 1.20.0, this looks like it's caused by a performance issue with SUMO-GUI, where calling the traci.simulationStep() method to the SUMO-GUI redraw interface under high load or in other specific cases will block the SUMO itself until the GUI is drawn. However, the Traci client does not block when SUMO is blocked, which makes your Python program continue to run until it encounters the next traci.simulationStep(), where Traci asks SUMO to update the GUI, and SUMO has not yet completed the GUI update requested by the previous Traci command. This makes the SUMO-GUI interface freeze or become unresponsive. I personally do this by calling time.sleep() to wait for the SUMO-GUI draw to finish and keep sync with Traci, but that's obviously not an elegant workaround. Hopefully, my understanding is correct, and I look forward to a better solution from others.

namdre commented 2 months ago

The problem comes from https://github.com/eclipse-sumo/sumo/blob/cd9a71df42cecc46f22049b8d52d8bd75d6078d5/src/gui/GUIViewTraffic.cpp#L345 and is triggered by either having a vehicle placed outside the network via https://github.com/eclipse-sumo/sumo/blob/cd9a71df42cecc46f22049b8d52d8bd75d6078d5/src/microsim/MSVehicle.cpp#L903 or by visualizing it's route, link items or bestlanes from the context menu.

The lock() call fails as long as control is inside https://github.com/eclipse-sumo/sumo/blob/cd9a71df42cecc46f22049b8d52d8bd75d6078d5/src/microsim/MSNet.cpp#L730 (by way of https://github.com/eclipse-sumo/sumo/blob/cd9a71df42cecc46f22049b8d52d8bd75d6078d5/src/guisim/GUINet.cpp#L234)

A possible solution is to only lock the vehicle/person that is currently drawn when looping over myAdditionallyDrawn. However, it's very hard to verify the correctness of any possible fix because crashes due to insufficient locking cannot be reliably reproduced / ruled out. Hence this shouldn't be done shortly before a release.

Minokori commented 2 months ago

Thanks for the quick reply! I think I've learned what caused the issue and look forward to it being resolved in a future release. Also, I guess maybe a traci.simulationStepAsync() function could be added for the Traci client so that there is an option to use the await keyword in the python call to wait for the GUI thread to render complete.