Closed peci1 closed 3 months ago
Pinging @iche033 as the author of #251. I know this might be a difficult to debug issue. Don't you have a quick clue what could be wrong here?
The same crash can also be observed by launching your sensor_particles.sdf world. If I launch it on 4.4.0, it doesn't crash (but there are white rectangles instead of particles, of course).
I've narrowed down the crash to this pass:
If I comment it out, the crash does not occur (but particles are rendered wrongly).
Also, maybe related: https://github.com/ignitionrobotics/ign-sensors/issues/67 .
I think it could be issue happening on mesa driver / AMD card. I tried testing on two computers but they both have nvidia cards so was not able to reproduce it.
Given that the other passes work fine, I made some minor changes so that the textures and this particular pass are setup the same way as the particleDepthTexture
pass. Changes are in the depth_pass_crash. Can you see if that makes any difference for you?
Thank you for trying to debug this issue, Ian. However, the suggested change did not help.
It can very well be a driver issue because my system setup is not really conventional (Ubuntu Bionic with manually installed kernel 5.10 and Mesa from some PPA that provides a newer version; all of that done to support the recent Renoir GPU). I've just run some OpenGL benchmarks and all of them ran without an issue. So I would still think that the system is in a more or less good shape. But I understand that OpenGL benchmarks do not usually try rendering depth cameras...
I can play around with some configuration myself, but I need some pointers at where to poke, because I don't understand OpenGL or OGRE that much...
To rule out the influence of my nonstandard setup, I booted into Ubuntu 20.04.2 HWE, which has kernel 5.8 by default, and this should already have support for the Renoir GPUs. Even on this Ubuntu and with Ignition Dome, I got the exact same error.
ogre2.log from Ubuntu 20.04:
For comparison, here's Ubuntu 18.04 with GeForce 1050 where the example particle emitter world works:
I removed timestamps and have compared the ogre2.log from the working computer with nvidia GPU and the ubuntu 20.04 computer with AMD GPU.
What caught my attention was
-GL_ARB_depth_texture
It seems that the selected OpenGL profile on AMD doesn't contain the depth_texture extension (although I don't know whether it is needed or not).
In glxinfo, I can see the extension, but only for the compat profile, not for core profile. So I tried launching with MESA_GL_VERSION_OVERRIDE=4.6COMPAT
, which correctly selected the compat profile, OGRE has found the depth_texture extensions, but it still resulted in the very same exception.
it weird that commenting that specific pass made things not crash because a few lines below, I'm doing the same depth pass with particleDepthTexture
which has the same depth texture format and that worked. Just brainstorming some things to try:
remove the mVisibiliyMask
setting for the depthTexture pass, i.e this line:
passScene->mVisibilityMask = IGN_VISIBILITY_ALL
& ~Ogre2ParticleEmitter::kParticleVisibilityFlags;
comment out the colorTexture (and change baseNodeDef->setNumTargetPass(5)
to baseNodeDef->setNumTargetPass(4)
) pass and see if you still get the crash. The RGBD camera's color image won't work but just for testing.
remove the depthTexture
's clear ass (and change depthTargetDef->setNumPasses(2)
to depthTargetDef->setNumPasses(1)
)
I've tried all of your suggestions. None of them helped. I also found #244 and tried it, but it also did not help.
I found a way to tell the GPU driver to write out which textures does it load. There are some differences between what is loaded when running sensors_demo.sdf and sensor_particles.sdf (I commented out all sensors except one depth camera):
--- /tmp/log_demo.txt 2021-03-05 13:15:36.312868546 +0100
+++ /tmp/log_particles.txt 2021-03-05 13:15:18.016786299 +0100
@@ -26,9 +26,20 @@
Surf: size=4096, slice_size=4096, alignment=4096, swmode=22, epitch=31, pitch=32
DCC: offset=4096, size=4096, alignment=4096, pitch_max=511, num_dcc_levels=1
Texture:
- Info: npix_x=1024, npix_y=1024, npix_z=1, blk_w=1, blk_h=1, array_size=10, last_level=10, bpe=4, nsamples=0, flags=0x0, r8g8b8a8_srgb
- Surf: size=62914560, slice_size=6291456, alignment=65536, swmode=26, epitch=1535, pitch=1024
- DCC: offset=62914560, size=262144, alignment=65536, pitch_max=1023, num_dcc_levels=2
+ Info: npix_x=2048, npix_y=2048, npix_z=1, blk_w=1, blk_h=1, array_size=10, last_level=11, bpe=1, nsamples=0, flags=0x0, r8_unorm
+ Surf: size=62914560, slice_size=6291456, alignment=65536, swmode=26, epitch=3071, pitch=2048
+ DCC: offset=62914560, size=262144, alignment=65536, pitch_max=2047, num_dcc_levels=2
+Texture:
+ Info: npix_x=2048, npix_y=2048, npix_z=1, blk_w=1, blk_h=1, array_size=2, last_level=11, bpe=4, nsamples=0, flags=0x0, r8g8b8a8_srgb
+ Surf: size=50331648, slice_size=25165824, alignment=65536, swmode=26, epitch=3071, pitch=2048
+ DCC: offset=50331648, size=196608, alignment=65536, pitch_max=2047, num_dcc_levels=3
+Texture:
+ Info: npix_x=2048, npix_y=2048, npix_z=1, blk_w=1, blk_h=1, array_size=2, last_level=11, bpe=2, nsamples=0, flags=0x0, r8g8_snorm
+ Surf: size=25165824, slice_size=12582912, alignment=65536, swmode=26, epitch=3071, pitch=2048
+ DCC: offset=25165824, size=131072, alignment=65536, pitch_max=2047, num_dcc_levels=2
+Texture:
+ Info: npix_x=256, npix_y=256, npix_z=1, blk_w=1, blk_h=1, array_size=40, last_level=8, bpe=4, nsamples=0, flags=0x0, r8g8b8a8_srgb
+ Surf: size=15728640, slice_size=393216, alignment=65536, swmode=26, epitch=383, pitch=256
Texture:
Info: npix_x=320, npix_y=240, npix_z=1, blk_w=1, blk_h=1, array_size=1, last_level=0, bpe=16, nsamples=0, flags=0x0, r32g32b32a32_float
Surf: size=1310720, slice_size=1310720, alignment=65536, swmode=26, epitch=319, pitch=320
@@ -62,11 +73,3 @@
Info: npix_x=160, npix_y=120, npix_z=1, blk_w=1, blk_h=1, array_size=1, last_level=0, bpe=4, nsamples=0, flags=0x820000, z32_float
Surf: size=81920, slice_size=81920, alignment=4096, swmode=20, epitch=159, pitch=160
HTile: offset=98304, size=32768, alignment=32768
-Texture:
- Info: npix_x=320, npix_y=240, npix_z=1, blk_w=1, blk_h=1, array_size=1, last_level=0, bpe=4, nsamples=0, flags=0x60000, z32_float_s8x24_uint
- Surf: size=524288, slice_size=393216, alignment=65536, swmode=24, epitch=383, pitch=384
- HTile: offset=524288, size=262144, alignment=262144
- Stencil: offset=393216, swmode=24, epitch=511
-Texture:
- Info: npix_x=320, npix_y=240, npix_z=1, blk_w=1, blk_h=1, array_size=1, last_level=0, bpe=16, nsamples=0, flags=0x0, r32g32b32a32_float
- Surf: size=1228800, slice_size=1228800, alignment=256, swmode=0, epitch=319, pitch=320
I understand that in case of the particle world, there are some more textures loaded, but the emitter texture is 256x256. That would explain one of the additional textures. And I don't see the 64x1 color texture loaded anywhere. And what are the three 2048x2048 textures that appear in the particle world?
Then there are the final two textures missing from particle world, but that is probably just because it throws the exception earlier than these textures are set up.
I tried also gpu_ray in the particle world, and that works.
Big discovery! It's not the particle emitter that breaks stuff, it's the rescue randy that's in the particles world but not in sensors_demo (I haven't noticed, sorry for that; that would explain why the 2048x2048 textures were loaded in addition to sensors_demo world). But it still started causing the crashes with #251.
In particular, it's this definition of its texture:
<pbr>
<metal>
<albedo_map>https://fuel.ignitionrobotics.org/1.0/openrobotics/models/rescue randy/2/files/materials/textures/rescue_randy_albedo.png</albedo_map>
<!--normal_map>https://fuel.ignitionrobotics.org/1.0/openrobotics/models/rescue randy/2/files/materials/textures/rescue_randy_normal.png</normal_map-->
<!--metalness_map>https://fuel.ignitionrobotics.org/1.0/openrobotics/models/rescue randy/2/files/materials/textures/rescue_randy_metalness.png</metalness_map-->
<!--roughness_map>https://fuel.ignitionrobotics.org/1.0/openrobotics/models/rescue randy/2/files/materials/textures/rescue_randy_roughness.png</roughness_map-->
</metal>
</pbr>
If I uncomment any of the commented-out lines, I get a crash.
Maybe related, if I look into ogre2.log from the crashed sessions, I can see
Vertex Shader: 537100416VertexShader_vs
Fragment Shader: 537100416PixelShader_ps
GLSL validation result :
active samplers with a different type refer to the same texture image unit
Can't this be some kind of interference with the textures loaded for the rescue randy?
This is the randy without the three PBR textures on AMD GPU:
I also tried enabling e.g. the normal texture and resizing it to 512x512, but that did not help.
I tested all other SubT artifacts and all of them work except rescue randy (even rescue randy sitting works, as it doesn't use PBR).
Where can I get cloudsim_sim.ign? I can't repro without that :(
I have similar HW so I should be hopefully able to repro the bug.
Usually this problem happens because the driver optimized out an uniform variable, e.g.
uniform float myConst[4]; // it's not actually used
or decides to shorten the length of the array
uniform float myConst[4]; // the code actually uses range [0; 1] so driver decides to optimize it to myConst[2]
The solution once the offending shader is identified is to either spuriously use the last value (e.g. use myConst[3] to force the array to be of size 4) or to remove it (i.e. the optimizer after all noticed the extra entries aren't used)
cloudsim_sim.ign is part of the SubT simulator. You can install it following the instructions here: https://github.com/osrf/subt/wiki/Catkin%20System%20Setup . There are also docker images, but I'm not sure if they're the best to test GPU-related problems...
Thanks! And ouch! That's a lot of setup work.
Were you able to repro the bug with a simple/modified SDF I can launch?
Actually yes! I think this one causes the crash for me too, and does not require the SubT simulator:
sensor_particles.sdf:
Mmmm... I could not repro it in this GPU:
Device: Radeon RX 560 Series (POLARIS11, DRM 3.40.0, 5.11.0-051100-generic, LLVM 10.0.1) (0x67ff)
Mesa Version: 20.1.3 git-663fa46287
I have a Vega iGPU on the laptop (Ryzen 5 2500U), I can't test it right now should be able to test it in another time.
I can't help noticing a couple things:
AFAIK I use only the open-source drivers. Maybe it's the newest kernel?
Here's glxinfo:
Did you actually unpause the world? Sometimes it crashes only after unpause...
Mmm your mesa version is much newer than mine. I should look into using that exact version.
Thanks!
I wanted to say that https://github.com/ignitionrobotics/ign-rendering/pull/388 resolved this issue for me, but I was too quick. Another run crashed again with the same error (now on Linux 5.15.0, Mesa 21.2.5).
If you're able to run a debug version (or you could modify our source and printf the values) it would help us the following:
// In GLSLProgramManager::extractUniformsFromProgram
printf( "paramName %s\n", paramName.c_str() );
// In extractUniformsFromProgram parent's, I suspect it is GLSLMonolithicProgram::buildGLUniformReferences
if( mVertexShader )
{
printf( "mVertexShader %s\n", mVertexShader->getName().c_str() );
}
if( mFragmentShader )
{
printf( "mFragmentShader %s\n", mFragmentShader->getName().c_str() );
}
This will at least tell us where to look for. If you need assistance building Ogre from source let me know. (just dropping ignition fork into Colcon's workspace should work)
I got OGRE2 ignition fork built from source and used in runtime (both main and render plugins). However, I can't see any of the printf
outputs anywhere (neither in console with -v4
, nor in ~/.ignition/rendering/ogre2.log
). But I'm pretty sure the modified code gets executed.
Nevertheless, I got more info.
First of all, this assert does not go through:
If I comment it out, I get further. This is what I found in ogre2.log
afterwards:
17:48:12: Parsing script depth_camera.material
17:48:12: OGRE EXCEPTION(5:ItemIdentityException): Parameter called time does not exist. Known names are: backgroundColor backgroundColor[0] colorTexture colorTexture[0] depthTexture depthTexture[0] far far[0] max max[0] min min[0] near near[0] particleDepthTexture particleDepthTexture[0] particleScatterRatio particleScatterRatio[0] particleStddev particleStddev[0] particleTexture particleTexture[0] projectionParams projectionParams[0] rnd rnd[0] in GpuProgramParameters::_findNamedConstantDefinition at /media/data/ign/ogre/OgreMain/src/OgreGpuProgramParams.cpp (line 2218)
17:48:12: Compiler error: invalid parameters in depth_camera.material(33): setting of constant failed
17:48:12: Parsing script thermal.material
17:48:12: Parsing script gpu_rays.material
17:48:12: OGRE EXCEPTION(5:ItemIdentityException): Parameter called time does not exist. Known names are: colorTexture colorTexture[0] depthTexture depthTexture[0] far far[0] max max[0] min min[0] near near[0] particleDepthTexture particleDepthTexture[0] particleScatterRatio particleScatterRatio[0] particleStddev particleStddev[0] particleTexture particleTexture[0] projectionParams projectionParams[0] rnd rnd[0] in GpuProgramParameters::_findNamedConstantDefinition at /media/data/ign/ogre/OgreMain/src/OgreGpuProgramParams.cpp (line 2218)
17:48:12: Compiler error: invalid parameters in gpu_rays.material(33): setting of constant failed
So my intuition that time
shader variable could be the cause of the problem is probably right. But I don't know how to fix it.
The time
variables are no longer used. Here's a quick PR to that removes them from the material script:
https://github.com/ignitionrobotics/ign-rendering/pull/485. See if that helps
So the time
problem was just a smoke wall. I removed it and now I'm back where I was at the very beginning - memory corruption.
It seems the Ogre assert is detecting an actual memory corruption, so disabling it lets me go a bit further but all reported errors afterwards may be just wrong.
The corruption start about here:
Once inside createDatablock()
, getBlendBlock()
gets called with an argument that has at least two weird values:
$4 = (const Ogre::HlmsBlendblock &) @0x7fffb9303810: {<Ogre::BasicBlock> = {mRsData = 0x0, mRefCount = 0, mId = 0, mBlockType = 0 '\000', mAllowGlobalDefaults = 1 '\001'}, mAlphaToCoverageEnabled = false,
mBlendChannelMask = 1 '\001', mIsTransparent = true, mSeparateBlend = 67, mSourceBlendFactor = Ogre::SBF_SOURCE_COLOUR, mDestBlendFactor = Ogre::SBF_ONE, mSourceBlendFactorAlpha = Ogre::SBF_ONE,
mDestBlendFactorAlpha = Ogre::SBF_DEST_COLOUR, mBlendOperation = Ogre::SBO_MIN, mBlendOperationAlpha = 2999584171}
Notice mBlockType = 0
, should be 1 for a blend block (this is what triggers the assert). Also notice mBlendOperationAlpha = 2999584171
which is wrong because the alpha should be an enum with 4 values or so. I'm not sure whether mRefCount = 0
is wrong here or not...
If I disable the assert and let the code run through it, this shows on stack:
0x00007fffb2a1a278 in Ogre::Hlms::createDatablock (this=0x7fffb45c80f0, name=..., refName=<error reading variable: Cannot access memory at address 0x64f0b678>, macroblockRef=..., blendblockRef=...,
paramVec=std::vector of length -15012014088548807, capacity -15012014088548807 = {...}, visibleToManager=128, filename="",
resourceGroup=<error reading variable: Cannot access memory at address 0x3ff0000000000008>) at /media/data/ign/ogre/OgreMain/src/OgreHlms.cpp:1572
The paramVec
is obviously corrupted, thus it's no surprise searching for a parameter in it doesn't work and segfaults.
So, the question is where is this corruption coming from.
The first value where I noticed the corruption is the blendblockRef
parameter of Ogre::Hlms::createDatablock()
called from
As can be seen, the blend block object is freshly created and directly passed to createDatablock()
, so I think the corruption has to happen even earlier or somewhere around here.
Just to add context, the assert fails when creating this material, which also happens to be the very first one created in ignition::rendering::v4::BaseScene::CreateMaterials()
:
ignition::rendering::v4::Ogre2Scene::CreateMaterialImpl (this=0x7fffb47ff970, _id=65534, _name="Default/TransRed") at /media/data/ign/ign-rendering/ogre2/src/Ogre2Scene.cc:420
Complete stack trace up to the assert:
Sorry for taking my time. I know what's wrong but I need to be at my PC to look for the patch.
Basically in debug version ign needs to define OGRE_DEBUG_LEVEL to match (or alternatively disablr it on Ogre)
OK I'm at the PC.
Here's the patch:
diff --git a/ogre2/src/CMakeLists.txt b/ogre2/src/CMakeLists.txt
index 012a52a3..d8423429 100644
--- a/ogre2/src/CMakeLists.txt
+++ b/ogre2/src/CMakeLists.txt
@@ -1,3 +1,8 @@
+if( UNIX )
+ # lld is much faster than ld for linking
+ set( CMAKE_SHARED_LINKER_FLAGS "-fuse-ld=lld" )
+endif()
+
# Collect source files into the "sources" variable and unit test files into the
# "gtest_sources" variable.
ign_get_libsources_and_unittests(sources gtest_sources)
@@ -44,7 +49,7 @@ target_link_libraries(${ogre2_target}
terra
IgnOGRE2::IgnOGRE2)
-target_compile_definitions(${ogre2_target} PRIVATE $<$<CONFIG:Debug>:DEBUG=1 _DEBUG=1>)
+-target_compile_definitions(${ogre2_target} PRIVATE $<$<CONFIG:Debug>:DEBUG=1 _DEBUG=1>)
set (versioned ${CMAKE_SHARED_LIBRARY_PREFIX}${PROJECT_NAME_LOWER}-${engine_name}${CMAKE_SHARED_LIBRARY_SUFFIX})
diff --git a/ogre2/src/terrain/Terra/CMakeLists.txt b/ogre2/src/terrain/Terra/CMakeLists.txt
index 83c9ac90..32404087 100644
--- a/ogre2/src/terrain/Terra/CMakeLists.txt
+++ b/ogre2/src/terrain/Terra/CMakeLists.txt
@@ -25,8 +25,8 @@ endif()
add_definitions(-DOGRE_IGNORE_UNKNOWN_DEBUG)
-#target_compile_definitions(${PROJECT_NAME} PUBLIC
-# $<$<CONFIG:Debug>:DEBUG=1 _DEBUG=1>)
+target_compile_definitions(${PROJECT_NAME} PUBLIC
+ $<$<CONFIG:Debug>:DEBUG=1 _DEBUG=1>)
target_include_directories(${PROJECT_NAME}
PRIVATE
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index dcfd6f2f..10d16225 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -1,3 +1,8 @@
+if( UNIX )
+ # lld is much faster than ld for linking
+ set( CMAKE_SHARED_LINKER_FLAGS "-fuse-ld=lld" )
+endif()
+
# set compile definitions for tests
set_property(
SOURCE Camera_TEST.cc RenderTarget.cc
If the diff fails, what's really important is just:
target_compile_definitions(${ogre2_target} PRIVATE $<$<CONFIG:Debug>:DEBUG=1 _DEBUG=1>)
By defining DEBUG, Ogre headers will see this and ign-rendering will build correctly (assuming you're building with colcon build --cmake-args -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Debug -DBUILD_DOCS=OFF --merge-install
)
Ogre headers have a warning to detect this error but the fork stripped it :\
Thanks, that did work. Now I'm back at "GL doesn't agree...".
This is the log with the added debug prints (I used LogManager
instead of printf
):
14:59:56: Added resource location '/home/peci1/.ignition/fuel/fuel.ignitionrobotics.org/openrobotics/models/rescue randy/2/meshes' of type 'FileSystem' to resource group 'General' with recursive option
14:59:56: Initialising resource group General
14:59:56: Can't assign material scene::Material(65487) because this Material does not exist. Have you forgotten to define it in a .material script?
14:59:56: Added resource location '/home/peci1/.ignition/fuel/fuel.ignitionrobotics.org/openrobotics/models/rescue randy/2/materials/textures/' of type 'FileSystem' to resource group 'General'
14:59:56: Texture: loading rescue_randy_roughness.png as rescue_randy_roughness.png
14:59:56: Texture: loading rescue_randy_metalness.png as rescue_randy_metalness.png
14:59:56: Texture: loading rescue_randy_albedo.png as rescue_randy_albedo.png
14:59:57: Texture: loading rescue_randy_normal.png as rescue_randy_normal.png
14:59:57: WARNING: normal map texture rescue_randy_normal.png is not BC5S compressed. This is encouraged for lower memory usage. If you don't want to see this message without compressing to BC5, set getDefaultTextureParameters()[TEXTURE_TYPE_NORMALS].pixelFormat to PF_R8G8_SNORM (or PF_BYTE_LA if RSC_TEXTURE_SIGNED_INT is not supported)
14:59:57: Added resource location '/home/peci1/.ignition/fuel/fuel.ignitionrobotics.org/caguero/models/smoke_generator/2/materials/textures/' of type 'FileSystem' to resource group 'General'
14:59:57: Texture: loading smoke.png as smoke.png
14:59:57: mVertexShader 536969216VertexShader_vs
14:59:57: mFragmetShader 536969216PixelShader_ps
14:59:57: paramName worldMatBuf
14:59:57: paramName f3dGrid
14:59:57: paramName f3dLightList
14:59:57: Vertex Shader: 537001984VertexShader_vs
Fragment Shader: 537001984PixelShader_ps
GLSL validation result :
active samplers with a different type refer to the same texture image unit
14:59:57: mVertexShader 537001984VertexShader_vs
14:59:57: mFragmetShader 537001984PixelShader_ps
14:59:57: paramName worldMatBuf
14:59:57: paramName f3dGrid
14:59:57: paramName f3dLightList
14:59:57: paramName textureMaps
14:59:57: mVertexShader 1610612740VertexShader_vs
14:59:57: mFragmetShader 1610612740PixelShader_ps
14:59:57: paramName textureMapsArray
14:59:57: mVertexShader 536969344VertexShader_vs
14:59:57: paramName worldMatBuf
14:59:57: Vertex Shader: 537002112VertexShader_vs
Fragment Shader: 537002112PixelShader_ps
GLSL validation result :
active samplers with a different type refer to the same texture image unit
14:59:57: mVertexShader 537002112VertexShader_vs
14:59:57: mFragmetShader 537002112PixelShader_ps
14:59:57: paramName worldMatBuf
14:59:57: paramName textureMaps
And here are some details from GDB:
(gdb) frame 5
#5 0x00007fff94302538 in Ogre::GLSLMonolithicProgram::buildGLUniformReferences (this=0x7fffb551b5c0) at /media/data/ign/ogre/RenderSystems/GL3Plus/src/GLSL/OgreGLSLMonolithicProgram.cpp:297
297 GLSLMonolithicProgramManager::getSingleton().extractUniformsFromProgram(
(gdb) p vertParams
$1 = (const Ogre::GpuConstantDefinitionMap *) 0x7fffb550fe10
(gdb) p hullParams
$2 = (const Ogre::GpuConstantDefinitionMap *) 0x0
(gdb) p domainParams
$3 = (const Ogre::GpuConstantDefinitionMap *) 0x0
(gdb) p fragParams
$4 = (const Ogre::GpuConstantDefinitionMap *) 0x7fffb5508870
(gdb) p geomParams
$5 = (const Ogre::GpuConstantDefinitionMap *) 0x0
(gdb) p computeParams
$6 = (const Ogre::GpuConstantDefinitionMap *) 0x0
(gdb) p mGLUniformReferences
$7 = std::vector of length 1, capacity 1 = {{mLocation = 9, mSourceProgType = Ogre::GPT_VERTEX_PROGRAM, mConstantDef = 0x7fffb5528990}}
(gdb) p mGLAtomicCounterReferences
$8 = std::vector of length 0, capacity 0
(gdb) p mGLUniformBufferReferences
$9 = std::vector of length 0, capacity 0
(gdb) p mSharedParamsBufferMap
$10 = std::map with 0 elements
(gdb) p mGLCounterBufferReferences
$11 = std::vector of length 0, capacity 0
(gdb) p mGLProgramHandle
$12 = 12
(gdb) frame 4
#4 0x00007fff94315328 in Ogre::GLSLProgramManager::extractUniformsFromProgram (this=0x7fffb40c6080, programObject=12, vertexConstantDefs=0x7fffb550fe10, hullConstantDefs=0x0,
domainConstantDefs=0x7fffb5508870, geometryConstantDefs=0x0, fragmentConstantDefs=0x0, computeConstantDefs=0x0, uniformList=std::vector of length 1, capacity 1 = {...},
counterList=std::vector of length 0, capacity 0, uniformBufferList=std::vector of length 0, capacity 0, sharedParamsBufferMap=std::map with 0 elements, counterBufferList=std::vector of length 0, capacity 0)
at /media/data/ign/ogre/RenderSystems/GL3Plus/src/GLSL/OgreGLSLProgramManager.cpp:624
624 assert(size_t (arraySize) == newGLUniformReference.mConstantDef->arraySize
(gdb) p foundSource
$13 = true
(gdb) p arraySize
$14 = 1
(gdb) p newGLUniformReference.mConstantDef
$15 = (const Ogre::GpuConstantDefinition *) 0x7fffb5514f80
(gdb) p newGLUniformReference.mConstantDef->arraySize
$16 = 3
(gdb) p newGLUniformReference
$17 = {mLocation = 8202, mSourceProgType = Ogre::GPT_FRAGMENT_PROGRAM, mConstantDef = 0x7fffb5514f80}
(gdb) p paramName
$18 = "textureMaps"
(gdb) p uniformName
$19 = "textureMaps[0]\000\065\065].indices4_7\000]\000cale[0]\000\f\316\270\264\377\177\000\000\377\377\377\377\000\000\000\000x%<\365\377\177\000\000 *\n\264\377\177\000\000\000\321\315G\r\327F%\300\220e\365\377\177\000\000 *\n\264\377\177\000\000\000U&\271\377\177\000\000\371\002<\365\377\177\000\000\230W\016\262\377\177\000\000NL>\365\377\177\000\000 *\n\264\377\177\000\000\362\v\036\262\377\177\000\000\240\066\202\265\377\177\000\000\000\216E\264\002\000\000\000\200U&\271\377\177\000\000 *\n\264\377\177\000\000\220U&\271\377\177\000\000\230R6\365\377\177\000\000\000\000\000\000\000\000\000"
(gdb) p uniformCount
$20 = 2828
(gdb) p index
$21 = 2827
(gdb) p glType
$22 = 36289
(gdb) p uniformList
$23 = std::vector of length 1, capacity 1 = {{mLocation = 9, mSourceProgType = Ogre::GPT_VERTEX_PROGRAM, mConstantDef = 0x7fffb5528990}}
I keep the GDB running if you wanted more info.
What!?!? It is complaining about a PBS shader Ogre generated. I did not see this one coming.
I suspect it's textureMaps.
Find hlmsPbs->setDebugOutputPath(false, false);
in ign-rendering/ogre2/src/Ogre2RenderEngine.cc and change it to:
hlmsPbs->setDebugOutputPath(true, true, "/home/myusername/some_path_you_wish_to_dump/");
(note it must end with '/')
Then repro the crash again. Look at the Ogre.log, the last entries where:
14:59:57: mVertexShader 537002112VertexShader_vs 14:59:57: mFragmetShader 537002112PixelShader_ps
Therefore go to /home/myusername/some_path_you_wish_to_dump/, locate these 2 files, and upload them here.
Note that the actual filenames may change in the next run; so always look at the Ogre.log just in case
Cheers
Weird, there is only one pair of files.
The log says
15:33:36: Texture: loading smoke.png as smoke.png
15:33:36: mVertexShader 536969216VertexShader_vs
15:33:36: mFragmetShader 536969216PixelShader_ps
15:33:36: paramName worldMatBuf
15:33:36: paramName f3dGrid
15:33:36: paramName f3dLightList
15:33:36: Vertex Shader: 537001984VertexShader_vs
Fragment Shader: 537001984PixelShader_ps
GLSL validation result :
active samplers with a different type refer to the same texture image unit
15:33:36: mVertexShader 537001984VertexShader_vs
15:33:36: mFragmetShader 537001984PixelShader_ps
15:33:36: paramName worldMatBuf
15:33:36: paramName f3dGrid
15:33:36: paramName f3dLightList
15:33:36: paramName textureMaps
15:33:36: mVertexShader 1610612740VertexShader_vs
15:33:36: mFragmetShader 1610612740PixelShader_ps
15:33:36: paramName textureMapsArray
15:33:36: mVertexShader 536969344VertexShader_vs
15:33:36: paramName worldMatBuf
15:33:36: Vertex Shader: 537002112VertexShader_vs
Fragment Shader: 537002112PixelShader_ps
GLSL validation result :
active samplers with a different type refer to the same texture image unit
15:33:36: mVertexShader 537002112VertexShader_vs
15:33:36: mFragmetShader 537002112PixelShader_ps
15:33:36: paramName worldMatBuf
15:33:36: paramName textureMaps
but I can only find debug info for 1610612740, which is not the last shader... Nevertheless, here they are: 1610612740.zip
Hi!
It sounds like you set HlmsUnlit to dump the shaders, instead of HlmsPbs. There should be two instances of setDebugOutputPath; try setting them both
You're right. Here is the PBS log 537002112.zip
OK I was about to rule it out a driver bug; until I noticed something that's very wrong.
Ogre at some set hlms_render_depth_only
which means we're not rendering colour, but depth. This is usually left for shadow maps; but when that happens Ogre disables the pixel shader entirely.
For some reason this pixel shader was not disabled, which means we're not at shadow mapping stage.
Another reason for rendering to only depth is early Z prepass which AFAIK ignition doesn't use. I have no idea why hlms_render_depth_only was set. I suspect Ignition must be rendering to a PFDEPTH* format directly but trying to use PBS (which is invalid).
When the crash happens, could you post the full call stack? I want to see ignition's calls to see where it is in its rendering.
Update:
Yeah, definitely this could happen if ignition tries to render colour into a depth texture directly:
setProperty( HlmsBaseProp::RenderDepthOnly,
renderTarget->getForceDisableColourWrites() ? 1 : 0 );
I remember a similar issue with the LIDAR rendering; where Ogre2LaserRetroMaterialSwitcher::cameraPreRenderScene
would switch the materials for objects but accidentally leave other objects that failed the test untouched and would attempt to render them using standard PBS shaders.
I suspect something similar is happening.
At least now that I know what's happening I should be able to repro this problem; rather than relying on the driver screaming to us we could throw an exception if we're trying to render Pbs colour into a depth target; and from there see what is going on.
I think it is a depth camera where it crashes.
Here's the stack:
OK I understand what's going on now.
I have a quick question for @iche033 :
Is the object (I suspect it's the human model) supposed to be rendered by Ogre2DepthCamera?
If the answer is no, it's an ignition bug If the answer is yes, it's an Ogre bug
From Ogre side we should be able to fix it by adding this snippet:
@property( hlms_render_depth_only )
@set( hlms_disable_stage, 1 )
@end
Into Hlms/Pbs/GLSL/PixelShader_ps.glsl (no need to update Ogre since Ignition bundles this file into ign-rendering's repo)
Awesome! Adding your snippet to the shader fixed the issue. Let's wait for Ian to answer, but I think the answer will be yes, the randy should be rendered in depth camera.
yep the depth camera should see the rescue randy model. hmm I don't know why only that particular model has this issue though since other models are work fine and appear in the depth camera view.
Maybe because it uses a full stack of PBR textures, while other objects usually don't?
<pbr>
<metal>
<albedo_map>https://fuel.ignitionrobotics.org/1.0/openrobotics/models/rescue randy/2/files/materials/textures/rescue_randy_albedo.png</albedo_map>
<normal_map>https://fuel.ignitionrobotics.org/1.0/openrobotics/models/rescue randy/2/files/materials/textures/rescue_randy_normal.png</normal_map>
<metalness_map>https://fuel.ignitionrobotics.org/1.0/openrobotics/models/rescue randy/2/files/materials/textures/rescue_randy_metalness.png</metalness_map>
<roughness_map>https://fuel.ignitionrobotics.org/1.0/openrobotics/models/rescue randy/2/files/materials/textures/rescue_randy_roughness.png</roughness_map>
</metal>
</pbr>
I remember commenting out these textures was one of the workarounds for this bug.
yep the depth camera should see the rescue randy model. hmm I don't know why only that particular model has this issue though since other models are work fine and appear in the depth camera view.
Maybe because it uses a full stack of PBR textures, while other objects usually don't?
Yes, that's the reason
Just to clarify what's happening:
The fix is to tell Ogre not to compile the Pixel Shader (i.e. null out the Hlms-generated output) when we detect we're only outputting depth. That way driver will never see this PS and Ogre will never have a PS either.
@darksylinc Is this suggested fix something that should go into a PR? Or is it just a workaround?
From Ogre side we should be able to fix it by adding this snippet:
@property( hlms_render_depth_only ) @set( hlms_disable_stage, 1 ) @end
Into Hlms/Pbs/GLSL/PixelShader_ps.glsl (no need to update Ogre since Ignition bundles this file into ign-rendering's repo)
Ouch ouch ouch
Now that I'm with a fresh brain (honestly I forgot this issue wasn't fixed), I realized my assessment was partially wrong (and the fix is sadly not the right one).
What I said is right except:
hlms_disable_stage
when hlms_render_depth_only
is on:
@property( hlms_render_depth_only && !alpha_test && !hlms_shadows_esm && !macOS)
@set( hlms_disable_stage, 1 )
@end
alpha_test
is key. We still need to execute the pixel shader if alpha_test is enabled.uniform sampler2DArray textureMaps[3];
diffuseCol = texture( textureMaps[0], vec3( UV_DIFFUSE( inPs.uv0.xy ), diffuseIdx ) );
if( material.kD.w >= diffuseCol.a )
discard;
textureMaps[0]
gets used and is relevant thus the driver cannot optimize it out, but textureMaps[1]
and textureMaps[2]
although they're sampled, they can be optimized out since their results lead to nowhere.I'll have to think of a better way to fix this so that only diffuse textures are sent in this case
OK I noticed this problem should be gone from Ogre 2.2
Are you able to upgrade to Fortress (which uses 2.2)?
Are you able to upgrade to Fortress (which uses 2.2)?
That will have to happen in a short time. But Edifice is still affected.
I verified that in Fortress, I can successfully run the example that crashes in Dome.
Closing since this is fixed in Fortress
Environment
Description
Steps to reproduce
ign launch -v4 cloudsim_sim.ign robotName1:=X1 robotConfig1:=EXPLORER_X1_SENSOR_CONFIG_2 ros:=true durationSec:=3600 worldName:=simple_cave_01
Output
The cause of the crash is:
Whole stack trace is here:
ogre2.log: