OGRECave / ogre-next

aka ogre v2 - scene-oriented, flexible 3D C++ engine
https://ogrecave.github.io/ogre-next/api/latest
Other
1.08k stars 233 forks source link

Dangle pointer in map BarrierSolver::mResourceStatus leads to the false positive assert in BarrierSolver::debugCheckDivergingTransition #416

Closed StoyanovVlad closed 1 year ago

StoyanovVlad commented 1 year ago

System Information

Detailled description

Before creating OgreRenderWindow I create and load texture and setup it in HlmsPbsDatablock, during loading my texture the func GenerateHwMipmaps::_executeSerial() is called where temporary texture with name "___tempMipmapTexture" is created and destroyed at the end of func GenerateHwMipmaps::_executeSerial(). During destroying temporary texture in dtor this texture is added to the BarrierSolver which become a dangle pointer right after exiting dtor. After that I create OgreRenderWindow, during this call texture with "RenderWindow" is created, and if memory for this texture was allocated at the place where dangle pointer in BarrierSolver points to the pointer automatically becomes valid, what is wrong, because no adding was done and later assert will be triggered.

1)Here is adding texture to the Barrier

Screenshot 2023-09-11 at 23 53 57

2)Here is the dangle pointer before creating texture "RenderWindow"

Screenshot 2023-09-11 at 23 55 47

3)Here is stop right after allocating and pointer become valid, but no adding was done to the Barrier

Screenshot 2023-09-11 at 23 56 13

4)Here is my code detects dangle pointer to stop

Screenshot 2023-09-11 at 23 56 32

5)Here is the triggered assert and texture which trigger the assert

Screenshot 2023-09-11 at 23 56 53 Screenshot 2023-09-11 at 23 57 55

Ogre.log

Callstack

StoyanovVlad commented 1 year ago

Supposed fix - selected lines in screenshot

Screenshot 2023-09-12 at 00 22 44
darksylinc commented 1 year ago

Hi!

First, I congratulate you for such high quality ticket. I wish there were more like yours.

After that I create OgreRenderWindow, during this call texture with "RenderWindow" is created, and if memory for this texture was allocated at the place where dangle pointer in BarrierSolver points to the pointer automatically becomes valid, what is wrong, because no adding was done and later assert will be triggered.

Ah, basically the memory address got reused for an unrelated allocation. Yikes!

Supposed fix - selected lines in screenshot

Funny enough TextureGpuManager::destroyTextureImmediate already calls barrierSolver.textureDeleted but unfortunately it gets added again when it shouldn't have.

Does this alternate solution fix the problem?

diff --git a/RenderSystems/Vulkan/src/OgreVulkanQueue.cpp b/RenderSystems/Vulkan/src/OgreVulkanQueue.cpp
index 97b805e3f7..f6d4cfa874 100644
--- a/RenderSystems/Vulkan/src/OgreVulkanQueue.cpp
+++ b/RenderSystems/Vulkan/src/OgreVulkanQueue.cpp
@@ -1064,10 +1064,14 @@ namespace Ogre
         if( mEncoderState == EncoderCopyOpen )
         {
             bool needsToFlush = false;
+            bool mustRemoveFromBarrier = false;
             TextureGpuDownloadMap::const_iterator itor = mCopyDownloadTextures.find( texture );

             if( itor != mCopyDownloadTextures.end() )
+            {
                 needsToFlush = true;
+                mustRemoveFromBarrier = true;
+            }
             else
             {
                 FastArray<TextureGpu *>::const_iterator it2 =
@@ -1085,6 +1089,14 @@ namespace Ogre
                 OGRE_ASSERT_LOW( texture->mCurrLayout == VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL ||
                                  texture->mCurrLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL );
                 endCopyEncoder();
+
+                if( mustRemoveFromBarrier )
+                {
+                    // endCopyEncoder() just called solver.assumeTransition() on this texture
+                    // but we're destroying the texture. Remove the dangling pointer.
+                    BarrierSolver &solver = mRenderSystem->getBarrierSolver();
+                    solver.textureDeleted( texture );
+                }
             }
         }
     }

After that I create OgreRenderWindow

Unrelated to this bug: OgreNext historically was written when D3D7 and (pre 2.0) OpenGL were around; and it has the unfortunate assumption that the at least one RenderWindow must be the first thing created before anything else (and the last thing to be destroyed).

Because Vulkan & Metal no longer have this restriction; it may just work (i.e. creating GPU resources before creating the Window).

Personally I recommend that "just in case", if possible, you create the RenderWindow first to ensure everything is properly initialized.

StoyanovVlad commented 1 year ago

Hi! Thank you for your quick reply. Yes, your alternate solution fixes the problem as well.

darksylinc commented 1 year ago

Done! Thanks for the report!