TheRealMJP / TheRealMJP.github.io

Backing repo for my blog
16 stars 1 forks source link

GPU Memory Pools in D3D12 #11

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

GPU Memory Pools in D3D12

https://therealmjp.github.io/posts/gpu-memory-pool/

dwulive commented 2 years ago

One thing that is worth mentioning, is that, when copying memory around, the cache is not always your friend. The cache only helps if the memory is read or written more than once while the memory is in the cache. When uploading textures, there is a good chance that they will be completely evicted from the cache before they are used. So in a round about way, what I wanted to point out is that write combined memory is still your friend when doing one time transfers. It might even help to use the non temporal SSE/AVX instructions to avoid reading data into the full cache hierarchy and for good karma on the uncached writes.

Vinluo commented 2 years ago

Sorry, this might be a silly question, but what exactly does L0 mean? In my understanding, L0 cache usually refers to the register storage cache. The "demote L1 to L0" mentioned in the article may destroy performance, how should I understand?

TheRealMJP commented 2 years ago

Hey @Vinluo! I'm not honestly not sure where that L0/L1 terminology comes from in the case of D3D12 memory pools. It seems completely unrelated to cache hierarchies, so it's a bit unfortunate that it re-uses that same terminology. It's possible that it the naming comes from something internal to the Windows OS, or something along those lines.

When you're dealing with D3D12, it's just important to know that in that particular context L0 is SysRAM And L1 is VRAM, where VRAM can potentially have much higher bandwidth for the GPU. That's why demotion can hurt so badly: you may go from having 500 GB/s down to only 10-12 GB/s after demotion.

delphifirst commented 1 year ago

Hello, I wonder why CPU write performance for SysMem is lower than the the 56GB/s theoretical upper limit?

BTurkelson commented 5 months ago

Microsoft has added a feature called GPU Upload Heaps in the Agility SDK that allows you to allocate CPU visible VRAM in a manner similar to the vendor specific extension that you mention.

BTurkelson commented 5 months ago

(Hah, and of course I just notice the update at the bottom of the post)