It turns out that copying to the BAR from the CPU is actually slow, so the concept of a "fast transfer buffer" was not correct in the first place. Getting rid of this streamlines the transfer logic, reduces VRAM usage, and may improve performance in texture upload bottleneck scenarios.
There's also no need to specify ignored memory properties - the driver is specific about what kind of memory types the GetMemoryRequirements functions return, so there's no need to constrict that further.
Additionally, setting the CACHED bit on buffer allocation is a massive speedup.
It turns out that copying to the BAR from the CPU is actually slow, so the concept of a "fast transfer buffer" was not correct in the first place. Getting rid of this streamlines the transfer logic, reduces VRAM usage, and may improve performance in texture upload bottleneck scenarios.
There's also no need to specify ignored memory properties - the driver is specific about what kind of memory types the GetMemoryRequirements functions return, so there's no need to constrict that further.
Additionally, setting the CACHED bit on buffer allocation is a massive speedup.