Closed darksylinc closed 3 years ago
thanks for uncovering this bug. Note that there is also a Scene3D.cc in ign-gazebo and that's the one currently being used when ign gazebo
is run. There is a goal to merge the two Scene3Ds.
Ooofff!!
This is going to be one complex merge.
My fixes are still WIP but still I submitted the PR.
I'm mostly interested in hearing feedback about PR https://github.com/ignitionrobotics/ign-gazebo/pull/774 because I know for certain that ign-rendering is currently breaking the ABI and I know how to fix it (I just have to move SwapFromThread to the derived classes and then have ign-gazebo dynamic cast these classes and call them directly)
However I'm more inexperienced about ign-gazebo. I checked out the install folder generated by the make folders, and I could not find Scene.hh anywhere.
This suggests the classes I touched were never exposed to the user and therefore it is safe to alter them. However I may be wrong.
I'll be waiting on feedback for this. Cheers!
Update: It seems the tests are failing on ign-rendering PR, that was unexpected but I can see why I missed it. I made some local changes to trigger the race condition more aggressively and forgot to test the code without these changes. The module is basically crashing because we're attempting to create the same node definition twice, which should not be happening.
I know for certain that ign-rendering is currently breaking the ABI and I know how to fix it (I just have to move SwapFromThread to the derived classes and then have ign-gazebo dynamic cast these classes and call them directly)
ign-rendering's ogre2 rendering engine is currently dynamically loaded as a plugin library from ign-gazebo. If a downcast is needed in ign-gazebo, it means we'll need to explicitly link against the ogre2 target, correct? It's doable but that means we'll need to add a few cmake checks and ifdefs to make sure the user has ogre 2.x installed (it's not required since users can use ign-gazebo with just ogre 1.x). I'm leaning towards just merging this into the unstable branch (main
branch) in ign-gazebo and ign-rendering for our next release Ignition F. So you can just cherry pick your current changes and apply them to the main
branch. We can always backport this change later if you think it's going to help with other bugfixes you're working on.
However I'm more inexperienced about ign-gazebo. I checked out the install folder generated by the make folders, and I could not find Scene.hh anywhere.
Do you mean ign-gazebo's Scene3D.hh (as opposed to ign-rendering's Scene.hh)?
ign-rendering's ogre2 rendering engine is currently dynamically loaded as a plugin library from ign-gazebo. If a downcast is needed in ign-gazebo, it means we'll need to explicitly link against the ogre2 target, correct? It's doable but that means we'll need to add a few cmake checks and ifdefs to make sure the user has ogre 2.x installed (it's not required since users can use ign-gazebo with just ogre 1.x).
Mmm. You are right. It will be troublesome to backport.
To put another way: We just need a way to signal that a particular camera needs to do special work to swap its internal state.
If adding forcing the user to upgrade both ign-gazebo and ign-rendering at the same time is an option, then this can be fixed by adding a new class to ign-rendering which takes care of the downcasting. This will preserve ABI compatibility, but will cause trouble if the user upgrades ign-gazebo without upgrading ign-rendering (which could be detected at runtime because the symbol will be missing, and warn the user to upgrade ign-rendering as well).
I'm talking about something along the lines:
class CameraWorkaround
{
virtual void SwapFromThread( Camera *camera ) = 0;
}
Is this even feasible? It doesn't have to be a class. It could just be a C function we load dynamically at runtime. Hacky, but it preserves the ABI (as long as it's acceptable that upgrading ign-gazebo forces upgrading ign-rendering too)
I'm leaning towards just merging this into the unstable branch (
main
branch) in ign-gazebo and ign-rendering for our next release Ignition F. So you can just cherry pick your current changes and apply them to themain
branch. We can always backport this change later if you think it's going to help with other bugfixes you're working on.
As for how affects my other bugfixes: The current code is bugged and can corrupt GL state (apparently in non-crashing ways at least with what's we've seen so far with Mesa drivers).
That means that:
Basically whenever we encounter a bug, it's going to be the question: "is this code broken because I made a mistake, or just the race condition again?". It's similar to "blaming the compiler" when a bug crashes. It's almost never the compiler, except this time you have proof the compiler can occasionally misbehave.
However I'm more inexperienced about ign-gazebo. I checked out the install folder generated by the make folders, and I could not find Scene.hh anywhere.
Do you mean ign-gazebo's Scene3D.hh (as opposed to ign-rendering's Scene.hh)?
- ign-gazebo's Scene3D.hh is not installed - you are free to make changes to this file without worrying about API/ABI compatibility
- ign-rendering's Scene.hh is installed
Yes, sorry. I meant Scene3D.hh. Great that it's not installed, as I suspected
If adding forcing the user to upgrade both ign-gazebo and ign-rendering at the same time is an option, then this can be fixed by adding a new class to ign-rendering which takes care of the downcasting. This will preserve ABI compatibility, but will cause trouble if the user upgrades ign-gazebo without upgrading ign-rendering (which could be detected at runtime because the symbol will be missing, and warn the user to upgrade ign-rendering as well).
yes the example you gave is feasible. We've actually done something similar before in other libraries for critical bugfixes.
We can also require users to upgrade ign-rendering dependency version when they upgrade ign-gazebo. Currently ign-gazebo and ign-rendering are on version 5.0.0. Once the changes get in, we can bump both versions to 5.1.0 and tell ign-gazebo to look for ign-rednering 5.1.0 by add VERSION 5.1
here.
Just throwing in another idea. Is it possible to do this through ign-gazebo's Scene3D plugin by using 2 cameras that take turn to render? I imagine the difficulty here is keeping the 2 camera pose control (pan, orbit, zoom, etc) in sync.
Just throwing in another idea. Is it possible to do this through ign-gazebo's Scene3D plugin by using 2 cameras that take turn to render? I imagine the difficulty here is keeping the 2 camera pose control (pan, orbit, zoom, etc) in sync.
That may not be a bad idea. I'll have to think a few details about on how feasible are other improvements I have to do (basically saving VRAM)
Hi!
I've been thinking about this and I realized I missed a simple element, which I couldn't discuss in the middle the weekend and wait for a reply, but NOW is the right time to discuss this:
I assume rendering is currently happening in a different thread because "that's what the sample we found in the Qt wiki did".
There are pros and cons of this approach:
I recommend to keep rendering in another thread to avoid the cryptic GL errors, but serializing the operations.
That means the UI thread is blocked until Ogre is done. I've done this successfully in order to test RenderDoc debugging and it worked. By serializing rendering:
It actually gets better, because newer versions at any time can add support for a double buffer scheme on-demand, where at launch-time the user can leverage whether to enable parallel rendering, sacrificing some VRAM for a potential increase in UI responsiveness, or disable parallel rendering.
I was aiming to do parallel rendering by default, but add a command line argument to request serialized rendering for RenderDoc debugging. However it seems that we should do serialized rendering by default, defer parallel rendering for another time (it can be added any time in future versions) and move on to the next problem.
Thoughts?
I'm also leaning towards doing serialized rendering by default and keeping the code less complicated (I hope). This sounds like the changes will mainly be done in Scene3D class in the way we integrate ogre3d with Qt. And yes we did parallel rendering as that was what worked at the time, so it's good to hear that you were already able to have something working already.
As for parallel rendering, I think the VRAM sacrifice is relatively small compared to some our typical use cases. But I agree we can leave this for the future if we find limitations in serialized rendering, thanks.
ign-rendering uses the same pattern as provided textureinthread sample from Qt. (the code can be seen in repo)
What should happen
The pattern is the following (ideal, i.e. what should happen):
This is basically a double-buffer scheme maintained by hand.
It is even described in Qt's sample (emphasis mine):
What is actually happening
This is causing a race condition because A is both being rendered to while also being used internally as a texture by Qt. This can cause all sorts of weird behaviors or maybe even crashes when Qt UI is used.
Most likely how bad this bug is depends on which driver is being used (e.g. Mesa radeonsi vs Mesa Intel, vs Windows Intel, vs proprietary AMD, vs NVIDIA proprietary, vs Nouveau) and the workload balance and processing speed of the CPU (like all race conditions)
Why it went unnoticed
This bug went unnoticed because Gazebo defaults to using 8x MSAA (but due to bugs in setup, MSAA is not actually used, that's another problem) and the FBO passed to Qt is the resolved one. Due to this, the window for the race condition is extremely small (because FBO is only written to during the resolve operation, which is only a fraction of the whole render).
However once I disable MSAA the problem becomes much more apparent, notice the background flicker:
https://user-images.githubusercontent.com/3395130/114423489-9a847c80-9b8d-11eb-9398-8233669468a2.mp4
Environment
Is Ogre1 affected?
I don't know. I did not evaluate. The possibility is high, given that part of the buggy behavior comes from ign-gui, which is agnostic to both Ogre1 and Ogre2
Is OptiX affected?
Unknown.
Tasks
Affected files
Most of the bugs affect the files coming from:
Other
I will be the one fixing this bug. This ticket is for tracking progress