KhronosGroup / Vulkan-Samples

One stop solution for all Vulkan samples
Apache License 2.0
4.33k stars 648 forks source link

Access violation when recreate swapchain #981

Closed Zieng closed 8 months ago

Zieng commented 8 months ago

It seems when you run samples that will recreate swapchain, then you will trigger access violation: image image

Note the views size is some invalid data. I'm wondering the problem is in the move constructor, we first moved the other, then access its member

HPPAllocated{std::move(other)},
...
views(std::exchange(other.views, {}))

It seems this is not safe in cpp.

To reproduce the issue, I run

vulkan_samples.exe sample ray_tracing_basic

which will call update_swapchain() and cause recreating swapchain.

SaschaWillems commented 8 months ago

Thanks for bringing this up.

Can you add some details? What platform? What device? I can't reproduce it on Windows 11 with an NVIDIA RTX 4070.

@jherico, @asuessenbach : any idea? It might be related to with #906 or #910.

JoseEmilio-ARM commented 8 months ago

I can reproduce with the AFBC sample, after I rebased on #910

SaschaWillems commented 8 months ago

It's even worse for me. For some of the samples I do get access violation at startup, others will now display nothing at all and seem to be stuck with a white screen. And this happens on main. Looks like we broke our samples 😟

I did test when reviewing, and everything was fine back then.

The basic ray tracing sample e.g. doesn't work for me at all anymore. It either throws an access violation right after starting, or is stuck with a white screen. I also see some odd warnings about unsupported swap chain formats. Happens both in release and debug. Same for all other ray tracing samples.

jherico commented 8 months ago

Investigating this. if I can reproduce and it's not immediately obvious what the cause is I'll run a bisect to try to find out when it was introduced.

SaschaWillems commented 8 months ago

Thanks. Very much appreciated 👍🏻

jherico commented 8 months ago

I have a fix, will explain the issue in more detail in the PR.

jherico commented 8 months ago

As I mentioned in the PR, this is because of the violation of an unstated assumption that all the C and C++ wrapper classes will be memory compatible (have the same layout and size). I reordered two members in Image but not in HPPImage in #906, triggering this when #910 was merged and introducing code paths where a Image&& was being passed and implicitly converted to a function taking a HPPImage&&.

A proper long term fix would be to either avoid any code path that does such casting (not sure how that would work with the current design) or to add lots of static_asserts to ensure all the C++ wrappers and their corresponding C wrappers (like HPPImage and Image) are actually memory compatible.

Tat's tricky though. the naïve approach of

static_assert(sizeof(HPPImage) == sizeof(Image), "HPPImage and Image must be memory compatibile");
static_assert(offsetof(HPPImage, create_info) == offsetof(Image, create_info), "HPPImage and Image must be memory compatibile");
static_assert(offsetof(HPPImage, subresource) == offsetof(Image, subresource), "HPPImage and Image must be memory compatibile");
static_assert(offsetof(HPPImage, views) == offsetof(Image, views), "HPPImage and Image must be memory compatibile");

doesn't work because all the members are inaccessible to the global scope. Every pair of classes would need a common friend verification function to do this check, and keeping the checks up to date would be laborious.

I doubt this will be a popular opinion, but I think the framework would be better served by simply adopting the use of the C++ bindings everywhere and stick exlucsively to the sample code as the place to show off C and C++ bindings.

Alternatively, the framework could be set-up to use either the C or the C++ bindings exclusively at configure time rather than picking at runtime, by adding an option to the CMake and using preprocessor directives at build time to select the API. If the use selects the C bindings, the C++ samples could just be excluded from the build (but not vice versa).