Open AlexKaravaev opened 10 months ago
Can I somehow check what is exactly the callback1 means? Also suspicous thing is how I understood, if Sleeping in SensorManager dropped by 5x, then also WorldUpdate should(Because Sensors are updated by World Rate?), but what I have is that while WorldUpdate drops by 5x, for Sleeping it's more like 100x
Do you have any custom plugins running in your simulation? Perhaps there is something going on in the callback registered by those plugins? callback1
correspond to when the gazebo-classic's event system calls a callback: https://github.com/gazebosim/gazebo-classic/blob/7ccef40e24831eb5cd974b489ee279fc064eacc4/gazebo/common/Event.hh#L303 . One thing you could do is to add some profiler calls to the plugins your use, or use some other kind of profile. For example, if your are on non-virtualized linux amd64, for example you can use Intel VTune or Magic Trace to get more info beside the one provided by the gazebo-classic's profiler.
As a general comment, I would in general suggest any user of Gazebo Classic to migrate to gz-sim, but I guess it is not trivial in your case.
@traversaro thanks for the answer.
We have a lot of custom plugins, so that is also the reason why we can't migrate to gz-sim quickly. I tried Magic Trace, but the problem is that buffer is really just couple of ms max, so I can't understand why it freezes from this trace.
I tried adding profiling to all of all our plugins, but I had 2 problems:
This magic trace output if that would help
Hello! I recently encountered a similar issue. In my case, the problem was in the callback function; the operations took too long. When refactoring my code, I moved only the copying of data from the gazebo topic into the callback, and moved complex operations into a separate thread, I hope this will help you.
Environment
Description
I was debugging that for quite a while, but still haven't managed to find a cause. We have automated tests for the robot and sometimes it happens that Gazebo RTF drops to 0.01(normally we have around 0.8) and it stays there for 2-5 minutes, then goes back into normal mode. Strange thing is that it happens absolutely randomly and moreover(maybe take this one is with grain of salt) this only happens on our servers and only in podman(it's like docker) container. Both conditions must be true. It doesn't happen on my personal laptop in container. And simulation runs normally if run on the host of the server lol. As for differences between server and laptop, they are really similar: We both have some mid-tier nvidia graphics card, good cpu(My is AMD, server is Intel, but I would be very surprised if that would matter), same amount of ram(32gbs) and ubuntu 20.04 isntalled. Server though doesn't have a monitor attached.
I ran profiler with it and it also shows that SensorLoop sleeps a lot, but also World Update step takes too much time. I couldn't get any more info from profiler, so I don't know, but I am attaching the screenshots
I would appreciate any thoughts/tips how to debug that
Steps to reproduce
Unfortunately, I don't know :( I think if I would have known, that would solve the problem. I also cannot share source code unfortunately because of NDA, but will be willing to assist.
Output
There is no output from Gazebo when it happens.