Transfer benchmarking example, dx12 query resolve

kvark commented 3 years ago

Introduces a new example for benchmarking small transfers! Fixes a small case where Linearly tiled image is created on the known memory. This is still a hack, but more useful than the old code.

Interestingly, the AMD machine on windows totally craps out when the total number of copy regions exceeds 4M per submission. So I had to keep the size down.

Results for 1K by 1K:	API	OS	Hardware	image->image
Vulkan	Win7	AMD Rx470	320ms	18 ms
Vulkan	Win10	Gtx1080	19ms	19ms
Vulkan	Win10	Iris 530	620ms	620ms

Results for 512 by 512:	API	OS	Hardware	image->image	buffer->image
DX12	Win10	AMD 3500U	836ms	600 ms
Vulkan	Win10	AMD 3500U	95ms	6ms

On Metal and DX11, timestamps are not implemented yet.

PR checklist:

[ ] make succeeds (on *nix)
[ ] make reftests succeeds
[x] tested examples with the following backends: Vulkan

tangmi commented 3 years ago

I ran the example on my computer, in case the extra data points help

Results for 1K by 1K:	API	OS	Hardware	image->image
DX12	Win10	Gtx1080	573	832
DX12	Win10	Gtx1080	472	832
Vulkan	Win10	Gtx1080	479	22
Vulkan	Win10	Gtx1080	480	22

Results for 512 by 512:	API	OS	Hardware	image->image
DX12	Win10	Gtx1080	136	211
DX12	Win10	Gtx1080	119	207
Vulkan	Win10	Gtx1080	142	6
Vulkan	Win10	Gtx1080	133	5

kvark commented 3 years ago

bors r=tangm,kvark

bors[bot] commented 3 years ago

Build failed:

Windows Stable

kvark commented 3 years ago

bors r=tangm,kvark

bors[bot] commented 3 years ago

Build succeeded:

gfx-rs / gfx

Transfer benchmarking example, dx12 query resolve #3620