cyberbotics / webots

Webots Robot Simulator
https://cyberbotics.com
Apache License 2.0
3.24k stars 1.7k forks source link

Blocking Webots in fast mode use 100% cpu #3504

Closed ShuffleWire closed 1 year ago

ShuffleWire commented 3 years ago

Describe the Bug When running in fast mode, with an external controller not running, Webots use 100% of one core.

Steps to Reproduce

  1. Open the following world :

    #VRML_SIM R2021c utf8
    WorldInfo {
    coordinateSystem "NUE"
    }
    Viewpoint {
    }
    Robot {
    controller "<extern>"
    }
  2. Build the following controller :

    #include <webots/robot.h>
    int main(int argc, char **argv)
    {
    wb_robot_init();
    while (wb_robot_step(32) != -1)
    {
    }
    wb_robot_cleanup();
    return 0;
    }
  3. Execute the controller

  4. Run the simulation in fast mode

  5. Kill the controller

  6. The simulation stop (that is expected), but the cpu usage is 100% on one core.

Expected behavior The cpu usage should be near 0%. This bug is only with fast mode, in normal mode, the cpu load is about 0%

System

lukicdarkoo commented 3 years ago

Did you mean?

#VRML_SIM R2021c utf8
WorldInfo {
  coordinateSystem "NUE"
}
Viewpoint {
}
Robot {
  controller "<extern>"
}
ShuffleWire commented 3 years ago

Oh ! yes indeed, but just found out that <any text> is considered as a tag when not using quotes, hence it doesn't not show... (as in `#include <webots/robot.h>') I've just edit the message to use block code, as they are not that long.

SammyRamone commented 2 years ago

I can confirm this issue. I use the Webots simulator for reinforcement learning with the stable-baselines3 framework. Since the RL process includes phases where the GPU is learning or evaluation is done in a different environment, there are a lot of phases where the controller is not doing anything. Still the simulator is taking up a complete core. This is obviously very unfortunate for performance and power consumption reasons. I circumvented the issue by using the normal mode. Since my simulation is so complex that it does not simulate in more than real time, it does not pose an issue for me. Still, I think that this issue should be fixed, as it is a typical scenario in learning to not run the controller all the time. In normal mode, the CPU load behaves as expected and only gets high if the simulator is actually running. This can also be tested by using an external controller which just calls time.sleep(100) or something similar.

omichel commented 2 years ago

I confirm the issue as well. It happens not only with an extern controller, but also with a regular controller.

fast.wbt ```js #VRML_SIM R2022b utf8 WorldInfo { } Viewpoint { } Robot { controller "fast" } ```
fast.c ```c #include #include int main() { wb_robot_init(); while (wb_robot_step(32) != -1) usleep(1000000); wb_robot_cleanup(); return 0; } ```
ShuffleWire commented 2 years ago

Just built webots master@152f170ed0944177e5e0ed98b6bb899be8c97478

@omichel something should have changed recently (during the external controller code refactor ?) but know the behavior is the following : load world with external controller run the world (and not the controller) in real time mode : the world don't advance obviously, and cpu 0% -> OK switch to fast mode, world don't advance, but cpu 100% -> NOT OK run the external controller, the world advance kill the controller (don't let him execute wb_robot_cleanup(), not sure if it matter, given the recent fix about it), world stop, and cpu goes back to 0% -> OK, but strange ! and btw, your "fast" world behave the same, ie use 100% cpu @SammyRamone well, at least for you if you forgot the start behavior, your issue is fixed :) between two controller restart, your cpu should go back to 0%

ShuffleWire commented 2 years ago

using @omichel world, i'm pausing webots to have an idea what use 100% cpu, and got various places :

#0  0x00007ffff4d23967 in sched_yield () at ../sysdeps/unix/syscall-template.S:120
#1  0x0000555555bf8e43 in WbControlledWorld::step() (this=0x55555f0d9ae0) at control/WbControlledWorld.cpp:236
#2  0x0000555555bf8653 in WbControlledWorld::processWaitingStep() (this=0x55555f0d9ae0) at control/WbControlledWorld.cpp:161
#3  0x0000555555bf8611 in WbControlledWorld::triggerStepFromTimer() (this=0x55555f0d9ae0) at control/WbControlledWorld.cpp:155
#4  0x0000555555e2436f in QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (WbSimulationWorld::*)()>::call(void (WbSimulationWorld::*)(), WbSimulationWorld*, void**) (f=&virtual table offset 144, o=0x55555f0d9ae0, arg=0x7fffffffd740) at ../../include/qt/QtCore/QtCore/qobjectdefs_impl.h:152
#5  0x0000555555e23eb3 in QtPrivate::FunctionPointer<void (WbSimulationWorld::*)()>::call<QtPrivate::List<>, void>(void (WbSimulationWorld::*)(), WbSimulationWorld*, void**)
    (f=&virtual table offset 144, o=0x55555f0d9ae0, arg=0x7fffffffd740) at ../../include/qt/QtCore/QtCore/qobjectdefs_impl.h:185
#6  0x0000555555e236a1 in QtPrivate::QSlotObject<void (WbSimulationWorld::*)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase*, QObject*, void**, bool*)
    (which=1, this_=0x55555f26b5b0, r=0x55555f0d9ae0, a=0x7fffffffd740, ret=0x0) at ../../include/qt/QtCore/QtCore/qobjectdefs_impl.h:395
#7  0x00007ffff7b38b8a in  () at /home/yann/webots/lib/webots/libQt6Core.so.6
#8  0x00007ffff7b47bca in QTimer::timeout(QTimer::QPrivateSignal) () at /home/yann/webots/lib/webots/libQt6Core.so.6
#9  0x00007ffff7b2999f in QObject::event(QEvent*) () at /home/yann/webots/lib/webots/libQt6Core.so.6
#10 0x00007ffff67dea62 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /home/yann/webots/lib/webots/libQt6Widgets.so.6
#11 0x00007ffff7adc01a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /home/yann/webots/lib/webots/libQt6Core.so.6
#12 0x00007ffff7c5e82b in QTimerInfoList::activateTimers() () at /home/yann/webots/lib/webots/libQt6Core.so.6
#13 0x00007ffff7d42c6c in  () at /home/yann/webots/lib/webots/libQt6Core.so.6
#14 0x00007ffff292de6b in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#15 0x00007ffff292e118 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#16 0x00007ffff292e1cf in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007ffff7d42f5a in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /home/yann/webots/lib/webots/libQt6Core.so.6
#18 0x00007ffff7ae80bb in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /home/yann/webots/lib/webots/libQt6Core.so.6
#19 0x00007ffff7ae400b in QCoreApplication::exec() () at /home/yann/webots/lib/webots/libQt6Core.so.6
#20 0x0000555555c69ab2 in WbGuiApplication::exec() (this=0x7fffffffdbf0) at gui/WbGuiApplication.cpp:288
#21 0x0000555555f3ff1f in main(int, char**) (argc=1, argv=0x7fffffffdff8) at gui/main.cpp:214
#0  __libc_recvmsg (flags=0, msg=0x7fffffffd730, fd=8) at ../sysdeps/unix/sysv/linux/recvmsg.c:28
#1  __libc_recvmsg (fd=8, msg=0x7fffffffd730, flags=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:25
#2  0x00007ffff2288575 in  () at /lib/x86_64-linux-gnu/libxcb.so.1
#3  0x00007ffff2288f6f in xcb_poll_for_reply64 () at /lib/x86_64-linux-gnu/libxcb.so.1
#4  0x00007ffff2700c75 in  () at /lib/x86_64-linux-gnu/libX11.so.6
#5  0x00007ffff2701002 in _XEventsQueued () at /lib/x86_64-linux-gnu/libX11.so.6
#6  0x00007ffff26f2931 in XPending () at /lib/x86_64-linux-gnu/libX11.so.6
#7  0x00007ffff017f3ef in  () at /lib/x86_64-linux-gnu/libgdk-3.so.0
#8  0x00007ffff292d57f in g_main_context_prepare () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#9  0x00007ffff292dfdb in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x00007ffff292e1cf in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007ffff7d42f5a in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /home/yann/webots/lib/webots/libQt6Core.so.6
#12 0x00007ffff7ae80bb in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /home/yann/webots/lib/webots/libQt6Core.so.6
#13 0x00007ffff7ae400b in QCoreApplication::exec() () at /home/yann/webots/lib/webots/libQt6Core.so.6
#14 0x0000555555c69ab2 in WbGuiApplication::exec() (this=0x7fffffffdbf0) at gui/WbGuiApplication.cpp:288
#15 0x0000555555f3ff1f in main(int, char**) (argc=1, argv=0x7fffffffdff8) at gui/main.cpp:214

and could get more I think, it's should be just a matter of where it pause... Maybe it could get info on where the code in running at that time..., but I voting for some active polling somewhere

ShuffleWire commented 2 years ago

This might be causing this behavior : WbSimulationWorld.cpp:328 mTimer->start(0); // the 0 argument here is important, don't remove it Because it create a active wait situation, i think. This code date from 2018, does it actually still make sense ? (not throwing shade, I think that because a lot of refactoring around external controller have been made these days)

The thing that puzzle me is the different behavior between before connecting the external controller and after disconnecting it...

omichel commented 2 years ago

Yes, it might be the cause of the problem. Can you try to patch it to see if you can manage to get rid of the problem?

ShuffleWire commented 2 years ago

So I've tried something, but not about this line. Actually, it's correct to put 0 : After some testing, setting to 1 reduce the issue, but reduce also the performance by adding a sleep at each time-step. I've noticed that by putting 1000 (1sec) here, and I got step that took 1 real second to execute... So that should not be the way to go

omichel commented 1 year ago

@SammyRamone and @ShuffleWire: may I kindly ask you to test #5329 and confirm it fixes the problems you reported here?

ygoumaz commented 1 year ago

The PR was merged this morning. @SammyRamone and @ShuffleWire, feel free to test the next nightly build available tonight (31-10-2022) and to confirm that the changes fixed your issues.

ShuffleWire commented 1 year ago

Sorry for late reply, was out of office. It work for me ! Thank you !