KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
776 stars 408 forks source link

Crash in spirv::EntryPoint:GetAccessibleIds #8501

Open Axel-Reactor opened 2 months ago

Axel-Reactor commented 2 months ago

Environment:

Describe the Issue I'm hitting a crash with a compute shader in spirv::EntryPoint:GetAccessibleIds

[vcruntime140.dll] _CxxThrowException 0x00007ff96eeb51d0
[VkLayer_khronos_validation.dll] robin_hood::detail::Table<1,80,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,enum VkValidationFeatureDisableEXT,robin_hood::hash<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,void>,std::equal_to<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::throwOverflowError() 0x00007ff8ee51b54f
[Inlined] [VkLayer_khronos_validation.dll] robin_hood::detail::Table<1,80,unsigned int,void,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> >::insert_move(robin_hood::detail::Table<1,80,unsigned int,void,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> >::DataNode<robin_hood::detail::Table<1,80,unsigned int,void,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> >,1> &&) 0x00007ff8ee54f0f3
[VkLayer_khronos_validation.dll] robin_hood::detail::Table<1,80,unsigned int,void,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> >::rehashPowerOfTwo(unsigned long long,bool) 0x00007ff8ee54f0ee
[VkLayer_khronos_validation.dll] robin_hood::detail::Table<1,80,unsigned int,void,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> >::insertKeyPrepareEmptySpot<unsigned int const &>(const unsigned int &) 0x00007ff8ee54dcf4
[Inlined] [VkLayer_khronos_validation.dll] robin_hood::detail::Table<1,80,unsigned int,void,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> >::emplace(const unsigned int &) 0x00007ff8ee9851d5
[Inlined] [VkLayer_khronos_validation.dll] robin_hood::detail::Table<1,80,unsigned int,void,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> >::insert(const unsigned int &) 0x00007ff8ee9851c2
[VkLayer_khronos_validation.dll] spirv::EntryPoint::GetAccessibleIds(const spirv::Module &,spirv::EntryPoint &) shader_module.cpp:471
[VkLayer_khronos_validation.dll] spirv::EntryPoint::EntryPoint(const spirv::Module &,const spirv::Instruction &,const robin_hood::detail::Table<1,80,unsigned int,std::vector<std::shared_ptr<spirv::ImageAccess const >,std::allocator<std::shared_ptr<spirv::ImageAccess const > > >,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,const robin_hood::detail::Table<1,80,unsigned int,std::vector<spirv::Instruction const *,std::allocator<spirv::Instruction const *> >,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,const robin_hood::detail::Table<1,80,unsigned int,unsigned int,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,const robin_hood::detail::Table<1,80,unsigned int,spirv::Instruction const *,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &) shader_module.cpp:769
[Inlined] [VkLayer_khronos_validation.dll] std::_Construct_in_place(spirv::EntryPoint &,const spirv::Module &,const spirv::Instruction &,robin_hood::detail::Table<1,80,unsigned int,std::vector<std::shared_ptr<spirv::ImageAccess const >,std::allocator<std::shared_ptr<spirv::ImageAccess const > > >,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,std::vector<spirv::Instruction const *,std::allocator<spirv::Instruction const *> >,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,unsigned int,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,spirv::Instruction const *,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &) 0x00007ff8ee988c77
[Inlined] [VkLayer_khronos_validation.dll] std::_Ref_count_obj2<spirv::EntryPoint>::{ctor}(const spirv::Module &,const spirv::Instruction &,robin_hood::detail::Table<1,80,unsigned int,std::vector<std::shared_ptr<spirv::ImageAccess const >,std::allocator<std::shared_ptr<spirv::ImageAccess const > > >,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,std::vector<spirv::Instruction const *,std::allocator<spirv::Instruction const *> >,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,unsigned int,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,spirv::Instruction const *,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &) 0x00007ff8ee988c41
[Inlined] [VkLayer_khronos_validation.dll] std::make_shared(const spirv::Module &,const spirv::Instruction &,robin_hood::detail::Table<1,80,unsigned int,std::vector<std::shared_ptr<spirv::ImageAccess const >,std::allocator<std::shared_ptr<spirv::ImageAccess const > > >,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,std::vector<spirv::Instruction const *,std::allocator<spirv::Instruction const *> >,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,unsigned int,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &,robin_hood::detail::Table<1,80,unsigned int,spirv::Instruction const *,robin_hood::hash<unsigned int,void>,std::equal_to<unsigned int> > &) 0x00007ff8ee988c2c
[VkLayer_khronos_validation.dll] spirv::Module::StaticData::StaticData(const spirv::Module &,spirv::StatelessData *) shader_module.cpp:1204
[VkLayer_khronos_validation.dll] spirv::Module::Module(unsigned long long,const unsigned int *,spirv::StatelessData *) shader_module.h:662
[Inlined] [VkLayer_khronos_validation.dll] std::_Construct_in_place(spirv::Module &,const unsigned long long &,const unsigned int *const &,spirv::StatelessData *&&) 0x00007ff8ee9b4c6e
[Inlined] [VkLayer_khronos_validation.dll] std::_Ref_count_obj2<spirv::Module>::{ctor}(const unsigned long long &,const unsigned int *const &,spirv::StatelessData *&&) 0x00007ff8ee9b4c5e
[Inlined] [VkLayer_khronos_validation.dll] std::make_shared(const unsigned long long &,const unsigned int *const &,spirv::StatelessData *&&) 0x00007ff8ee9b4c46
[VkLayer_khronos_validation.dll] ValidationStateTracker::PreCallRecordCreateShaderModule(VkDevice_T *,const VkShaderModuleCreateInfo *,const VkAllocationCallbacks *,VkShaderModule_T **,const RecordObject &,chassis::CreateShaderModule &) state_tracker.cpp:4756
[VkLayer_khronos_validation.dll] CoreChecks::PreCallRecordCreateShaderModule(VkDevice_T *,const VkShaderModuleCreateInfo *,const VkAllocationCallbacks *,VkShaderModule_T **,const RecordObject &,chassis::CreateShaderModule &) cc_spirv.cpp:2470
[VkLayer_khronos_validation.dll] vulkan_layer_chassis::CreateShaderModule(VkDevice_T *,const VkShaderModuleCreateInfo *,const VkAllocationCallbacks *,VkShaderModule_T **) chassis.cpp:998

image

I know this is probably not terribly helpful without the SPIR-V, but I'm not at liberty to provide that. Maybe someone can take a guess. Happy to provide more info if needed.

One thing I noticed is that the hash map is resizing (rehashPowerOfTwo). Maybe that's a hint. Although this is only a uint32 set, I don't see how that even could get corrupted.

Axel-Reactor commented 2 months ago

I tried editing the shader and the doesn't seem to be a clear cause and effect. If I remove different pieces of code it passes. It seems like there is a certain threshold of imageStore/imageLoad calls that causes it to fail.

Axel-Reactor commented 2 months ago

I compiled the validation layers in debug, and it's calling throwOverflowError in robin hood here:

        // we don't retry, fail if overflowing
        // don't need to check max num elements
        if (0 == mMaxNumElementsAllowed && !try_increase_info()) {
            throwOverflowError();
        }

try_increase_info fails because mInfoInc is 2

    bool try_increase_info() {
        ROBIN_HOOD_LOG("mInfoInc=" << mInfoInc << ", numElements=" << mNumElements
                                   << ", maxNumElementsAllowed="
                                   << calcMaxNumElementsAllowed(mMask + 1))
        if (mInfoInc <= 2) {
            // need to be > 2 so that shift works (otherwise undefined behavior!)
            return false;
        }

I have no idea, is this a bug in the hash map implementation?

Axel-Reactor commented 2 months ago

This actually goes away if I replace the robin hood set with STL in this case, which is extremely upsetting:

image

spencer-lunarg commented 2 months ago

thanks for looking into this

  1. can you confirm the SPIR-V is fully valid (spirv-val throws no errors)
  2. when it crashes, what is the size of result_ids?
Axel-Reactor commented 2 months ago
  1. spirv-val --scalar-block-layout --target-env vulkan1.3 C:\Users\...\shader.spv returns no errors
  2. I don't remember the exact count, but it was ~700 entries, so nothing crazy (not crazy w.r.t. hash map size, it's a pretty big shader)
spencer-lunarg commented 2 months ago

if possible could you try

         // Try to add to the output set
-        if (!result_ids.insert(worklist_id).second) {
-            continue;  // If we already saw this id, we don't want to walk it again.
+        if (result_ids.contains(worklist_id)) {
+            continue;
+        } else {
+            result_ids.insert(worklist_id);
         }

without knowing any internal of how robin hood works, only thought is if there is an issue when keep trying to insert duplicate entries

Axel-Reactor commented 2 months ago

Same issue, still crashes on the insert: image State of the hash map: image

It consistently crashes with 689 entries

spencer-lunarg commented 2 months ago

I'm very confident I have thrown large shaders with over 700 entries for this. I assume you are on Windows 11?

The best thing I can do without the SPIR-V and make sure a large enough shader can not crash at 689 entries

Axel-Reactor commented 2 months ago

Yes, Windows 11, but I don't see how that's relevant? Let me try if this happens with stripped SPIR-V, I can probably give that to you.

Axel-Reactor commented 2 months ago

Alright, here is the obfuscated SPIR-V, crashes in the same way for me raygen_rs-0x694a1322c182da48.zip

spencer-lunarg commented 2 months ago

so quick update, I was able to reproduce the crash... I found removing the 10,000 line OpSource fixed it, so now think this not an issue with the hashmap, but how we might be storing the OpSource for such a large shader

edit - actually just going spirv-dis and then right away going spirv-as fixes it ...

if I go spirv-dis --raw-id and then spirv-as --preserve-numeric-ids it will still crash as normal

spencer-lunarg commented 2 months ago

more update, wrote a test capturing the IDs

#include <array>
TEST_F(VkPositiveLayerTest, RobinHood) {
    vvl::unordered_set<uint32_t> result_ids;

    std::array<uint32_t, 704> ids = { /* dumped out */ };
    for (auto id : ids) {
        if (!result_ids.insert(id).second) {
        }
    }
}

and it works fine, then I tried going

-    vvl::unordered_set<uint32_t> worklist;
+    std::unordered_set<uint32_t> worklist;

and it worked... something is going on having 2 robin hood uint32_t hashes going together in the same scope, trying to figure out why this is the case