KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.76k stars 419 forks source link

Requirements for DXVK support.. #203

Open oscarbg opened 6 years ago

oscarbg commented 6 years ago

Hi, this is just a report of "things" missing in MoltenVK to support DXVK now that Wine supports Vulkan on MacOS via MoltenVK (I use v1.0.12 included in Vulkan MacOS SDK 1.0.77).. I tested DXVK 0.6.3 on Wine 3.13 with Vulkan support enabled on MacOS and simple programs like included d3d11-compute or d3d11-triangle failed due to unsupported extensions so I checked logs and seems it needs this extensions (as of 0.6.3):

From: https://github.com/doitsujin/dxvk/blob/master/src/dxvk/dxvk_extensions.h we see requirements for DxvkInstanceExtensions: VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2 the other two are emulated by Wine already..

for DxvkDeviceExtensions it needs: VK_KHR_DEDICATED_ALLOCATION_EXTENSION_NAME, VK_KHR_DESCRIPTOR_UPDATE_TEMPLATE_EXTENSION_NAME, VK_KHR_GET_MEMORY_REQUIREMENTS_2_EXTENSION_NAME,
VK_KHR_IMAGE_FORMAT_LIST_EXTENSION_NAME,
VK_KHR_MAINTENANCE1_EXTENSION_NAME VK_KHR_MAINTENANCE2_EXTENSION_NAME VK_KHR_SHADER_DRAW_PARAMETERS

I modified code of this exts from DxvkExtMode::Required to DxvkExtMode::Optional but then it complains it can't create a D3D device with support for any feature levels ranging from D3D_FEATURE_LEVEL_9_1 to D3D_FEATURE_LEVEL_11_1..

it's because needs features missing in MoltenVK (as of 1.0.12 in Vulkan SDK 1.0.77): code checking feature level is here: https://github.com/doitsujin/dxvk/blob/master/src/d3d11/d3d11_device.cpp `

if (featureLevel >= D3D_FEATURE_LEVEL_9_1) { enabled.depthClamp = VK_TRUE; enabled.depthBiasClamp = VK_TRUE; enabled.fillModeNonSolid = VK_TRUE; enabled.pipelineStatisticsQuery = supported.pipelineStatisticsQuery; enabled.sampleRateShading = VK_TRUE; enabled.samplerAnisotropy = VK_TRUE; enabled.shaderClipDistance = VK_TRUE; enabled.shaderCullDistance = VK_TRUE; enabled.robustBufferAccess = VK_TRUE; }

if (featureLevel >= D3D_FEATURE_LEVEL_9_2) {
  enabled.occlusionQueryPrecise                 = VK_TRUE;
}

if (featureLevel >= D3D_FEATURE_LEVEL_9_3) {
  enabled.multiViewport                         = VK_TRUE;
  enabled.independentBlend                      = VK_TRUE;
}

if (featureLevel >= D3D_FEATURE_LEVEL_10_0) {
  enabled.fullDrawIndexUint32                   = VK_TRUE;
  enabled.fragmentStoresAndAtomics              = VK_TRUE;
  enabled.geometryShader                        = VK_TRUE;
  enabled.logicOp                               = supported.logicOp;
  enabled.shaderImageGatherExtended             = VK_TRUE;
  enabled.textureCompressionBC                  = VK_TRUE;
}

if (featureLevel >= D3D_FEATURE_LEVEL_10_1) {
  enabled.dualSrcBlend                          = VK_TRUE;
  enabled.imageCubeArray                        = VK_TRUE;
}

if (featureLevel >= D3D_FEATURE_LEVEL_11_0) {
  enabled.shaderFloat64                         = supported.shaderFloat64;
  enabled.shaderInt64                           = supported.shaderInt64;
  enabled.tessellationShader                    = VK_TRUE;
  // TODO enable unconditionally once RADV gains support
  enabled.shaderStorageImageMultisample         = supported.shaderStorageImageMultisample;
  enabled.shaderStorageImageReadWithoutFormat   = supported.shaderStorageImageReadWithoutFormat;
  enabled.shaderStorageImageWriteWithoutFormat  = VK_TRUE;
}

if (featureLevel >= D3D_FEATURE_LEVEL_11_1) {
  enabled.logicOp                               = VK_TRUE;
  enabled.vertexPipelineStoresAndAtomics        = VK_TRUE;
}

`

had to comment these lines (due to missing MoltenVK support): for getting up to D3D_FEATURE_LEVEL_9_1: enabled.sampleRateShading = VK_TRUE; enabled.shaderCullDistance = VK_TRUE; enabled.robustBufferAccess = VK_TRUE; for D3D_FEATURE_LEVEL_9_3: enabled.multiViewport = VK_TRUE; for D3D_FEATURE_LEVEL_10_1: enabled.fullDrawIndexUint32 = VK_TRUE; enabled.geometryShader = VK_TRUE; enabled.logicOp = supported.logicOp; for D3D_FEATURE_LEVEL_11_0: enabled.shaderFloat64 = supported.shaderFloat64; enabled.shaderInt64 = supported.shaderInt64; enabled.tessellationShader = VK_TRUE; enabled.shaderStorageImageWriteWithoutFormat = VK_TRUE;

with all these "hacks" it doesn't complain but crashes executing simple d3d11 apps because it may be using missing features..

of course MoltenVK supporting DXVK might be a herculean effort but DXVK has shown it's able to run lots of DX11 modern games and this will be awesome on Mac.. this is just a report of things needed to do (as a means of guiding priorities of things to implement)..

hope is useful for someone..

Gcenx commented 5 years ago

@Rastafabisch your log says Wine-Staging-4.7, by chance is that from my MEGA folder? If so you should know those were all the cross-compiles were built using the 10.11 SDK since osxcross only recently added support for newer SDK versions. That would explain the Metal 1.2 issue from earlier.

@oscarbg & @cdavis5e I wanted to ask is the Metal level now detected correctly for the running system or is it still locked to the SDK that’s used at wine compile time?

doitsujin commented 5 years ago

Under Metal, vertex attribute offsets must not exceed the vertex buffer stride.

To clarify, this is mostly needed to emulate "null" vertex buffers properly. When a D3D11 application binds no vertex buffer to a given slot, DXVK will instead bind a small buffer which contains nothing but zeroes, and set the stride to 0 to avoid out-of-bounds access.

I don't know any better way to emulate this that would work with this restriction. Some D3D11 apps may also do something similar themselves.

vskllee commented 5 years ago

哇哦!!!

Rastafabisch commented 5 years ago

@Rastafabisch your log says Wine-Staging-4.7, by chance is that from my MEGA folder?

You got me. I can send you my (minor) edits, if you like? (It might be worth it implementing a way to add environment variables. Basically expanding the wrappers existing WINEDEBUG variable GUI.)

If so you should know those were all the cross-compiles were built using the 10.11 SDK since osxcross only recently added support for newer SDK versions. That would explain the Metal 1.2 issue from earlier.

@oscarbg & @cdavis5e I wanted to ask is the Metal level now detected correctly for the running system or is it still locked to the SDK that’s used at wine compile time?

True, but this got fixed, as I recall.

Under Metal, vertex attribute offsets must not exceed the vertex buffer stride.

To clarify, this is mostly needed to emulate "null" vertex buffers properly. When a D3D11 application binds no vertex buffer to a given slot, DXVK will instead bind a small buffer […]

So if possible the easiest way might be ignoring the error and continuing, if this does not break subsequent functions.

Gcenx commented 5 years ago

@Rastafabisch I was only concerned that being compiled with the 10.11SDK could have caused false positives, I always use the current Vulkan SDK so that shouldn’t cause an issue.

As for the other thing sure just open an issue here as it’s not really related to MoltenVK directly. And I might forget if it’s not an open issue ;)

kristofferR commented 5 years ago

Here's the issue for adding the necessary Vulkan Events support: https://github.com/KhronosGroup/MoltenVK/issues/192

Quenz commented 5 years ago

192 fixed and closed, supported added in #708

ryao commented 5 years ago

@Rastafabisch Would you retest to see if things work better now?

ryao commented 5 years ago

By the way, D9VK forked DXVK to implement Direct3D 9. Of the 3 features stated to be what DXVK still needs, D9VK only needs cull distance estimation. May I suggest that whoever is working on implementing the remaining three features implement cull distance estimation first? Getting D9VK working would be a nice milestone on the way to getting DXVK working.

ovvldc commented 5 years ago

We had this list of things that DXVK support requires, and VkEvents are now supported. So the infrastructure is there. Many thanks for all that hard work, progress has been impressive.

So how are things on geometry shader emulation, cull distance emulation, and VK_EXT_transform_feedback? I understand that it is fairly easy to fool DXVK or any app into thinking that these features are present, but would that not lead to performance loss and/or horrible graphics glitches and/or crashes?

Degerz commented 5 years ago

@ovvldc It's impossible to fully emulate transform feedbacks just using GPU compute shaders on Metal. If geometry shaders or tessellation produces an unknown a number of primitives it's nearly impossible for GPUs to be able to correspond the output data in order with the input data ...

Metal would need to expose features specific to AMD's Navi/RDNA GPUs to be able to properly emulate transform feedbacks properly such as the NGG pipeline (primitive shaders) and global ordered append (DS_Ordered_Count) ...

ovvldc commented 5 years ago

Thanks for the explanation. That is a pity. Is there any other way of pulling out the buffers? Some games depends on transform feedbacks, though not all.

Degerz commented 5 years ago

@ovvldc No, it's unfeasible to patch out the rendering logic in games. Apple should just expose geometry shaders and transform feedbacks even if it's solely just for their Mac products instead of wasting everyone's time. If they desire parity so badly either they implement some of the hardware features from AMD's Navi GPUs or implement Nvidia's mesh shaders (I wonder if this is enough to emulate transform feedbacks) because hardware features do actually matter in this instance.

You could only get away with partially implementing transform feedbacks if you knew that input/output data were statically generated with an N:N ratio such as only having a vertex shader. Variably generated input/output data with an N:M ratio such as geometry shaders or tesselation which are known for doing data amplification are impossible to properly emulate with transform feedbacks unless you had certain hardware features to make it far easier.

ryao commented 5 years ago

Many games worked with DXVK before transform feedback support was added to Vulkan. There were a large minority of games that had issues such as crashing (a significant number of unity games), black screens (e.g. Job Simulator 2050), missing models (e.g. The Witcher 3) or missing effects (e.g. Overwatch) and other misrenderings (e.g. Waltz of the Wizard), but just getting to the point where transform feedback is all that is needed would be a huge step forward.

You would also have D9VK working, which would be a huge step forward in itself.

doitsujin commented 5 years ago

Variably generated input/output data with an N:M ratio such as geometry shaders or tesselation which are known for doing data amplification are impossible to properly emulate with transform feedbacks unless you had certain hardware features to make it far easier.

You could technically do geometry shaders since you know the maximum number of output vertices that are going to be generated, but you'd have to compact the range afterwards (which will hurt performance). It also doesn't work with tessellation or indirect draws.

So yeah, this is a mess and ultimately the main reason why I insisted on getting a Vulkan extension rather than adding weird hacks that only cover half the possible use cases. Note that most games whould be happy with those restrictions, but RenderDoc, which uses VK_EXT_transform_feedback for its mesh view, requires proper support for this.

Gcenx commented 5 years ago

You would also have D9VK working, which would be a huge step forward in itself.

@ryao The thing is how many 64Bit games use DirectX9? Remember wine can’t currently make use of MoltenVK only wine64 can as Metal is a 64Bit API.

kode54 commented 5 years ago

Well, yeah, but with the 32/64 shim and thunk dance Wine will have to start doing on Catalina to support 32 bit code, it may as well also shim 32 bit processes to be using 64 bit Metal. The real problem then becomes exchanging data between that low 32 bit address space and whatever the heck the system libraries want to map their stuff to.

ovvldc commented 5 years ago

Let's hope that Apple start moving all their laptops to 8 core CPUs ASAP. Otherwise, we won't have the performance needed to run anything.

ryao commented 5 years ago

@ovvldc A Hat In Time is the only Direct3D 9 game that I know to be 64-bit. There are likely others. @kode54’s point about the thunking does make it a moot point though.

By the way, the lack of geometry shader emulation, cull distance emulation, and VK_EXT_transform_feedback should be added to the known limitations list for MoltenVK.

K0bin commented 5 years ago

64 Bit D3D9 games:

By the way, the lack of geometry shader emulation, cull distance emulation, and VK_EXT_transform_feedback should be added to the known limitations list for MoltenVK.

I disagree. The known issues with MoltenVK are in regard to the Vulkan CTS. Unlike D3D11, geometry and tessellation shaders are not mandatory for complient Vulkan implementations. It lets you query if geometry/tessellation shaders are supported. Same for the extension obviously. Most Android drivers don't support those either.

ovvldc commented 5 years ago

Depends on who is the target for the documentation. If for developers who are up to speed on the entire Vulkan ecosystem, you can refer to compliance with CTS. If for people who are not deeply into Vulkan and want to know if MoltenVK can cover the features of some application, then having a section of limitations that go beyond the CTS would be good. Just as a bit of friendly guidance to the wider ecosystem.

ryao commented 5 years ago

He might have been pulling my leg, but gamedev1909 in the yuzu discord claims that renderdoc works with Witcher III and hairworks enabled in Parallels Desktop 15 on Mac OS X. He sent me this screenshot (which sadly does not appear to show renderdoc):

He declined to comment here despite multiple requests. Parallels Desktop 15 implements Direct3D 11 using Metal. If his claim is accurate, then there could be an undocumented API extension in Metal for getting transform feedback that Parallels Desktop 15 uses.

ryao commented 5 years ago

Is it just me seeings things incorrectly or are the limitations of Metal that MoltenVK is hitting in supporting DXVK all in areas unfriendly to tiling GPU architectures like the one used in the iPhone?

K0bin commented 5 years ago

No, from what I know geometry shaders and especially transform feedback are generally unfriendly to modern desktop GPUs as well.

Metal supports a lot of stuff that's not supported on any of their mobile GPUs. https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf

Kreyren commented 5 years ago

Referencing https://github.com/Winetricks/winetricks/pull/1318

Any news on MacOS support for DXVK?

Degerz commented 5 years ago

https://www.khronos.org/registry/OpenGL/extensions/NV/NV_mesh_shader.txt

(7) Do we support transform feedback with mesh shaders?

 RESOLVED:  No.  In the initial implementation of this extension, the hardware doesn't support it.

:/

Turing's mesh shaders are not totally capable of emulating the traditional geometry pipeline like I had hoped as it is with AMD's Primitive Shaders with global ordered append ... (supersede vs superset)

If D3D standardizes this behaviour then there is very little hope for Metal to be able to realistically provide full transform feedback functionality ...

ryao commented 4 years ago

Most Mac OS X systems are using either AMD or Intel graphics, and Vulkan extensions are not available from Metal. Why is the Vulkan extension for Turing relevant? Is there an equivalent function exposed in Metal that you are using the Vulkan documentation to understand?

Degerz commented 4 years ago

@ryao Mesh shaders are getting standardized in D3D and with imposed limitations on all 3 desktop hardware graphics vendors. If Nvidia can't support transform feedbacks with mesh shaders then D3D likely won't as well ...

Apple still cares about portability with Metal even with Nvidia GPUs because they don't want to be tied down to either AMD or Intel on macOS for the foreseeable future so if Apple decides to include mesh shaders as well in Metal then it'll come with the same limitations or with even more limitations judging by Apple's history to water down features ...

Besides, global ordered append is way too specific to AMD HW so it's not a portable mechanism to expose for relying on emulating transform feedbacks. (Apple has no desire to expose low level access for any other vendors except maybe for themselves with their own iOS GPUs)

ovvldc commented 4 years ago

We have another release with great work done by a cool group of contributors :).

Any features needed for DXVK that we can check off in this release? It was not entirely clear to me. Recent discussion suggests the geometry shaders are certainly not done yet.

oscarbg commented 4 years ago

DXVK 1.5.2 got released needing both Vulkan 1.1 device driver and loader 1.1 support.. let's see if a hack reporting MoltenVK driver as 1.1 helps with running simple "hello world" dx11 apps..

cdavis5e commented 4 years ago

Vulkan 1.1 support was already on my radar, since DXVK wanted it to accelerate fragment discard. You may encounter problems with that.

kakashidinho commented 4 years ago

Hi, what is the current status of this? Even if we leave geometry shader out for now, does transform feedback without Geometry shader work yet? I came across steam proton project and noticed they dropped macos support from their plan. A couple of searches led me here.

oscarbg commented 4 years ago

also slightly related noticed recently two Vulkan projects have recently updated needing/benefiting also from Vulkan 1.1 support: VKD3D DXIL branch: https://www.winehq.org/pipermail/wine-devel/2020-January/158470.html Vulkan 1.1 for wave ops: "Vulkan 1.1 is enabled if active, because subgroup operations requires it. SM 6.0 support is only activated if subgroup operations are sufficiently supported and DXIL is enabled. There are other features required for SM 6.0 such as 16-bit arithmetic and storage, but that is left for later." VkBasalt (since 1.3) "Vulkan 1.1 is now required" https://github.com/DadSchoorse/vkBasalt/releases/tag/v0.3.0

diegov12 commented 4 years ago

10.15.4 supposedly exposes transform feedback. Is this confirmed, and would it be enough for Molten?

jjeka commented 4 years ago

10.15.4 supposedly exposes transform feedback. Is this confirmed, and would it be enough for Molten?

It sounds interesting. Where have you found transform feedback mention in 10.15.4?

cdavis5e commented 4 years ago

10.15.4 supposedly exposes transform feedback. Is this confirmed, and would it be enough for Molten?

It sounds interesting. Where have you found transform feedback mention in 10.15.4?

I'm curious, too. I only saw vertex amplification (i.e. multiview) and rasterization rate maps.

Gcenx commented 4 years ago

Well it’s possible considering the changes Apple made to no32exec=0

Edit;\ Not related to this directly, it’s just one of the undocumented changes Apple made to 10.15.4

oscarbg commented 4 years ago

@diegov12 are you thinking about transform feedback support in OpenGL? that's irrelevant for MoltenVK.. in case it's about new Metal support please share info.. @Gcenx how no32exec=0 is related to transform feedback? seems related to remaining 32bit Mac support in Catalina..

eksuri commented 4 years ago

I also was interested in using D9VK / DXVK 1.5.1 with MoltenVK with a wine 32 bit game, but it also crashes on launch. I wonder if it is because the game is 32 bit and the 32on64 wine stuff doesn't work well with MoltenVK, or if MoltenVK doesn't have enough to support D9VK. I'm looking into getting more logs.

It might make more sense to try to get D9VK / DVVK1.5.1 working with D3D9 first rather than chasing after DXVK's latest as it increments the Vulkan version and continues to outchase MoltenVK.

What features do you all think are missing for D3D9 support?

cdavis5e commented 4 years ago

What features do you all think are missing for D3D9 support?

The only feature D9VK requires that we don't support is geometryShaders, and only because they're used for some DXVK meta operations. DXVK has versions of those meta shaders that don't need geometry shaders because they use VK_EXT_shader_viewport_index_layer, so we don't actually need geometry shader support. Other than that, there's nothing that D9VK requires that MoltenVK doesn't support.

What's left are a bunch of optional features, most of which we can't support right now because they require support from Metal:

The one optional feature that can be implemented is:

oscarbg commented 4 years ago

@cdavis5e just curious, have all this Metal needs been requested to Apple devs already? also curious about names like MTLDepthClipModeClipAndClamp , hope you are finding undocumented functionality on Metal API or playing with prerelease builds..

cdavis5e commented 4 years ago

@cdavis5e just curious, have all this Metal needs been requested to Apple devs already?

Yes they have.

also curious about names like MTLDepthClipModeClipAndClamp , hope you are finding undocumented functionality on Metal API or playing with prerelease builds..

They're names I made up.

sofakng commented 4 years ago

Is there any possibility of creating work-arounds for these missing Metal APIs? It's very disappointing to not have DX10/11/12 support for MacOS.

Regardless, I very greatly appreciate all of the hard work on Vulkan and MoltenVK!

johnothwolo commented 4 years ago

This is my opinion on this whole dxvk issue. I don't think Apple is going to make all those missing Metal APIs public. Apple loves control. They also work closely with Parallels, and probably so with Feral Interactive and any other big graphics app publisher, which explains AAA titles on Mac and Parallels' DX11 support.

They have a pattern of contributing only to things that somehow benefit them and their ecosystem. Will supporting DXVK bring them any revenue? Will it make Mac any more valuable? Will it improve the Apple ecosystem? How will their customers react? Despite their controlling nature, they still have to ask all these question before supporting something like DXVK. It could have larger implications than just enabling a niche of gamers. Their primary goal is to preserve the ecosystem.

I believe one other reason they moved to Metal is to control what is released on their platforms. Bill Hollings and his team are that were capable enough to challenge that locked ecosystem-centric decision that Apple made. They sort of broke down that cross platform wall Apple put up, and thanks to them developers don't have to completely rewrite code for Metal. However, that wall still has a foundation which is the whole proprietary graphics layer... basically Apple's control.

So not to be a cynic, but unless there's a sign from Apple, I don't think Dxvk will ever exist - maybe until Macs are ARM only, and by that time Steam, wine, etc won't be possible.

itsTyrion commented 4 years ago

Will it make Mac any more valuable?

yes

Will it improve the Apple ecosystem

a little

How will their customers react?

positive to neutral, because some macs have the hardware to game.. if there is a port or if you install windows/linux

Will they pull their head out of their ass tho? Very unlikely :/ (Which is what's keeping me from using macOS/Mac)

kakashidinho commented 4 years ago

Metal is still missing some features that make DXVK viable such as Transform Feedbacks and Geometry Shaders. I believe these features are very difficult and tedius to emulate in MoltenVK. I talked to Metal team some time in the past. And they clearly had no interest in making those features available since they are legacy and not very hardware friendly in their views. The moving to proprietary API such as Metal might not be a good move, however from my understanding their GPU team is very small compare to Khronos Group or NVIDIA/AMD GPU teams hence they cannot afford to follow bloated features of open standard API such as Vulkan and opt for their own API instead. It’s also easy to map the features to their in-house GPU on mobile devices. Nevertheless, imo at least they should have allowed the 3rd party GPU vendors on macOS to write their own drivers so that hidden features or even Vulkan vendor specific implementation could be made available on mac. Maybe Apple is afraid that no body will use Metal if they allow vendor to expose Vulkan implementation.

That’s a bit sad since I really want to play some of my favourite windows only games on mac without installing a secondary OS.

Degerz commented 4 years ago

@kakashidinho Things are about to get even worse with their in-house Apple silicon since it's tile-based deferred renderer ...

No chance that we'll ever see these features being exposed on Metal since they don't work well on a TBDR architecture. Transform feedback and geometry shaders is only considered 'legacy' to them because their GPUs doesn't handle it well compared to a desktop GPU. ;)

I guess the developers behind these graphics translation layers can ditch the idea of making D3D work over Metal altogether since emulating immediate mode over a TBDR GPU is too painful ...

mbarriault commented 4 years ago

I feel this thread has gotten a bit off-topic with a bit of misinformation. The one bit I want to correct is that Nvidia has been shipping TBDR in desktops since Maxwell, and as of a few years ago (possibly even no longer true) only Intel had hardware support for geometry shaders. Industry support for geometry shaders and transform feedback are indeed legacy features that are emulated in the drivers, and is actively discouraged since they incur such penalties.

Degerz commented 4 years ago

@mbarriault You're the one spreading misinformation ...

Nvidia GPUs don't do tiling at all. None of their GPUs store a small portion of the framebuffer state on-chip and they don't serialize draws per-tile. Vulkan renderpasses are practically meaningless and blend modes can be a dynamic state on their hardware too. Changing the blend state on a tiler involves a shader recompilation but this is not true for Nvidia hardware. Quite simply, Nvidia hardware don't pay for the penalties and don't gain the same optimizations as a tiler would so they're hardware obviously doesn't do tile-based rendering ...

Geometry shaders and transform feedback are all natively supported on desktop GPUs too so you made another incorrect statement. These are all very real features that actually maps to their hardware. Even lower level console APIs expose those features so they wouldn't even be considered legacy features at least with desktop GPUs ...

None of these features are 'emulated' with desktop GPUs ...

mbarriault commented 4 years ago

https://www.techpowerup.com/231129/on-nvidias-tile-based-rendering

http://jason-blog.jlekstrand.net/2018/10/transform-feedback-is-terrible-so-why.html

http://www.joshbarczak.com/blog/?p=667

These facts are well established amongst GPU programmers and have been for years, but ultimately this discussion sidetracks the thread.