DiligentGraphics / DiligentCore

A modern cross-platform low-level graphics API
http://diligentgraphics.com/diligent-engine/
Apache License 2.0
616 stars 133 forks source link

Alignment issue with the STDAllocator #623

Open magester1 opened 1 week ago

magester1 commented 1 week ago

System info: Windows 11 (10.0.22621.0) Clang 18.1.8 Diligent Engine v2.5.5 (cf6d15fa38)

I believe there's a problem with the alignment of memory allocated via the STDAllocator. I apologize for not having a minimal reproducible example, but this depends on the memory address returned by the OS and it has been quite a nightmare to reproduce. But I'm going to try my best to explain it.

Lately, my application has been crashing when being built in Release mode and using the Direct3D12 backend. I don't think the issue is actually related to that configuration, but it has been the only one that was able to consistently crash. This is the stack frame from the debugger (I built with Release + debug-info), interestingly enough, so far it has only ever happened while creating a uniform buffer and not with anything else:

[Inline Frame] app.exe!Diligent::STDAllocator<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::IMemoryAllocator>::construct(Diligent::BufferD3D12Impl::CtxDynamicData * p, const Diligent::BufferD3D12Impl::CtxDynamicData & args) Line 149 (...\DiligentEngine\DiligentCore\Common\interface\STDAllocator.hpp:149)
[Inline Frame] app.exe!std::_Normal_allocator_traits<Diligent::STDAllocator<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::IMemoryAllocator>>::construct(Diligent::STDAllocator<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::IMemoryAllocator> & _Al, Diligent::BufferD3D12Impl::CtxDynamicData * _Ptr, const Diligent::BufferD3D12Impl::CtxDynamicData & _Args) Line 588 (c:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include\xmemory:588)
[Inline Frame] app.exe!std::_Uninitialized_backout_al<Diligent::STDAllocator<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::IMemoryAllocator>>::_Emplace_back(const Diligent::BufferD3D12Impl::CtxDynamicData & _Vals) Line 1780 (c:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include\xmemory:1780)
[Inline Frame] app.exe!std::_Uninitialized_fill_n(Diligent::BufferD3D12Impl::CtxDynamicData * _First, unsigned __int64 _Count, const Diligent::BufferD3D12Impl::CtxDynamicData & _Val, Diligent::STDAllocator<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::IMemoryAllocator> & _Al) Line 1957 (c:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include\xmemory:1957)
[Inline Frame] app.exe!std::vector<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::STDAllocator<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::IMemoryAllocator>>::_Construct_n(const unsigned __int64 _Count, const Diligent::BufferD3D12Impl::CtxDynamicData & _Val) Line 2076 (c:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include\vector:2076)
app.exe!std::vector<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::STDAllocator<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::IMemoryAllocator>>::vector(const unsigned __int64 _Count, const Diligent::BufferD3D12Impl::CtxDynamicData & _Val, const Diligent::STDAllocator<Diligent::BufferD3D12Impl::CtxDynamicData,Diligent::IMemoryAllocator> & _Al) Line 619 (c:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include\vector:619)
app.exe!Diligent::BufferD3D12Impl::BufferD3D12Impl(Diligent::IReferenceCounters * pRefCounters, Diligent::FixedBlockMemoryAllocator & BuffViewObjMemAllocator, Diligent::RenderDeviceD3D12Impl * pRenderDeviceD3D12, const Diligent::BufferDesc & BuffDesc, const Diligent::BufferData * pBuffData) Line 67 (...\DiligentEngine\DiligentCore\Graphics\GraphicsEngineD3D12\src\BufferD3D12Impl.cpp:67)
app.exe!Diligent::MakeNewRCObj<Diligent::BufferD3D12Impl,Diligent::FixedBlockMemoryAllocator>::operator()<Diligent::FixedBlockMemoryAllocator &,Diligent::RenderDeviceD3D12Impl *,const Diligent::BufferDesc &,const Diligent::BufferData *const &>(Diligent::FixedBlockMemoryAllocator & CtorArgs, Diligent::RenderDeviceD3D12Impl * && CtorArgs, const Diligent::BufferDesc & CtorArgs, const Diligent::BufferData * const & CtorArgs) Line 671 (...\DiligentEngine\DiligentCore\Common\interface\RefCountedObjectImpl.hpp:671)
[Inline Frame] app.exe!Diligent::RenderDeviceBase<Diligent::EngineD3D12ImplTraits>::CreateBufferImpl<const Diligent::BufferData *>::<lambda_1>::operator()() Line 414 (...\DiligentEngine\DiligentCore\Graphics\GraphicsEngine\include\RenderDeviceBase.hpp:414)
app.exe!Diligent::RenderDeviceBase<Diligent::EngineD3D12ImplTraits>::CreateDeviceObject<Diligent::IBuffer,Diligent::BufferDesc,`lambda at ...\DiligentEngine\DiligentCore\Graphics\GraphicsEngine\include\RenderDeviceBase.hpp:412:28'>(const char * ObjectTypeName, const Diligent::BufferDesc & Desc, Diligent::IBuffer * * ppObject, Diligent::RenderDeviceBase<Diligent::EngineD3D12ImplTraits>::CreateBufferImpl<const Diligent::BufferData *>::<lambda_1> ConstructObject) Line 375 (...\DiligentEngine\DiligentCore\Graphics\GraphicsEngine\include\RenderDeviceBase.hpp:375)
[Inline Frame] app.exe!Diligent::RenderDeviceBase<Diligent::EngineD3D12ImplTraits>::CreateBufferImpl(Diligent::IBuffer * * ppBuffer, const Diligent::BufferDesc & BuffDesc, const Diligent::BufferData * const & ExtraArgs) Line 411 (...\DiligentEngine\DiligentCore\Graphics\GraphicsEngine\include\RenderDeviceBase.hpp:411)
app.exe!Diligent::RenderDeviceD3D12Impl::CreateBuffer(const Diligent::BufferDesc & BuffDesc, const Diligent::BufferData * pBuffData, Diligent::IBuffer * * ppBuffer) Line 556 (...\DiligentEngine\DiligentCore\Graphics\GraphicsEngineD3D12\src\RenderDeviceD3D12Impl.cpp:556)
app.exe!app::graphics::create_buffer(const Diligent::BufferDesc &, const Diligent::BufferData *) Line 966
...

And if you look at the disassembly you get this:

image

You can see the compiler is utilizing AVX instructions to (I think) optimize the copying of the CtxDynamicData structure. The ymmword operand indicates this is an AVX instruction, trying to write into the memory addressed by the RAX register. The value of RAX here is 0x0000014F49106F90. Now, for AVX operations you need the memory to be aligned to 32 bytes, however 0x0000014F49106F90 / 32 = 0xA7A48837C + 16/32 which is not.

I know that the CtxDynamicData structure is aligned to 64 bytes here (https://github.com/DiligentGraphics/DiligentCore/blob/v2.5.5/Graphics/GraphicsEngineD3D12/include/BufferD3D12Impl.hpp#L132). However, that information is lost when the STDAllocator allocates here (https://github.com/DiligentGraphics/DiligentCore/blob/v2.5.5/Common/interface/STDAllocator.hpp#L128). As you can see, the allocator instance has the type information, but when calling m_Allocator.Allocate(size, ...) that is lost and there's no guarantee that the OS is going to align this to T. It'll probably be aligned to std::max_align_t, which is generally 16.

For testing purposes, I'm assigning my own allocator that inherits from Diligent::IMemoryAllocator when creating the DRD12 device and contexts, and always allocate aligned to 32 bytes. With this it works correctly.

If my analysis here is correct, I believe fixing this would be quite simple by just extending the IMemoryAllocator::Allocate interface to take an alignment parameter so we have some of the type information when deriving our own allocators. And using aligned_malloc or similar in the default raw allocator.

TheMostDiligent commented 1 week ago

Thanks for analysis, it does makes sense to me. I will take a look at the allocator.