CTS issues with platform-specific drivers

billhollings commented 3 years ago

With CTS testing, we are starting to uncover issues that appear on particular platforms inconsistently:

Issue	Platforms Affected	Apple Bug ID	CTS Tests Affected	Status
`RGBA8Unorm` Clear color normalization rounding (0.75 becomes 192 instead of expected 191)	M1 Intel	FB9118171	`dEQP-VK.api.smoke.triangle` `dEQP-VK.api.smoke.triangle_ext_structs` `dEQP-VK.api.smoke.asm_triangle` `dEQP-VK.api.smoke.asm_triangle_no_opname` `dEQP-VK.api.smoke.unused_resolve_attachment`
`RGBA8Unorm` 16K-wide texture fails copy to buffer	AMD	FB9109411	`dEQP-VK.api.copy_and_blit.core.image_to_image.dimensions.src16384x4_dst16384x4.r8g8b8a8_unorm.r8g8b8a8_unorm.optimal_optimal`
`RG16Unorm` TextureView->Texture copy rounds values `< 1e-41` are to `-0.0`	Intel	FB9110497	`dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2d.r16g16_unorm.r32_sfloat.optimal_optimal`
Sampler border color opaque black `(0.0, 0.0, 0.0, 1.0)` appears as transparent red `(1.0, 0.0, 0.0, 0.0)` for formats `R4G4B4A4_UNORM_PACK16` or `R5G5B5A1_UNORM_PACK16`	M1	FB9269610	`dEQP-VK.pipeline.sampler.view_type..format.r4g4b4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black` `dEQP-VK.pipeline.sampler.view_type..format.r5g5b5a1_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black` _( is any of: 1d, 1d_unnormalized, 1d_array, 2d, 2d_unnormalized, 2darray, 3d) `dEQP-VK.pipeline.sampler.border_swizzle.r4g4b4a4_unorm_pack16..opaque_black`<br/>`dEQP-VK.pipeline.sampler.border_swizzle.r5g5b5a1_unorm_pack16.*.opaque_black`	Issue #37 adds capabilities to support this
Incorrect pixels on border between polygons	Intel	FB9805181	`dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2d.r8g8b8a8_unorm.r8g8b8a8_snorm.optimal_optimal` `dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2d.r8g8b8a8_unorm.a8b8g8r8_snorm_pack32.optimal_optimal` `dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2d.r8g8b8a8_uint.r8g8b8a8_snorm.optimal_optimal` `dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.2d.r8g8b8a8_uint.a8b8g8r8_snorm_pack32.optimal_optimal` ...
~~`BC7_RGBAUnorm` & `BC7_RGBAUnorm_sRGB` reserved Mode 8 (zeroed low byte) returns alpha 1.0 instead of required 0.0~~	M1	~~FB9732481~~	`dEQP-VK.texture.compressed.bc7_unorm_block_2d_pot` `dEQP-VK.texture.compressed.bc7_srgb_block_2d_pot` `dEQP-VK.texture.compressed.bc7_unorm_block_2d_npot` `dEQP-VK.texture.compressed.bc7_srgb_block_2d_npot` `dEQP-VK.texture.compressed_3D.bc7_unorm_block_3d_pot` `dEQP-VK.texture.compressed_3D.bc7_srgb_block_3d_pot` `dEQP-VK.texture.compressed_3D.bc7_unorm_block_3d_npot` `dEQP-VK.texture.compressed_3D.bc7_srgb_block_3d_npot`	GitLab 3501 workaround in CTS

cdavis5e commented 3 years ago

At least one of those is a dupe. I found the AMD wide texture issue a while back and filed FB8818522.

billhollings commented 3 years ago

Okay. Thanks. Good to hear it's repeatable.

cdavis5e commented 3 years ago

One odd thing I've noticed is that the AMD failure only happens on pixel formats where the pixel size is 4 bytes or less.

For future reference, I've also found a number of other issues which are specific to AMD devices via running the CTS:

FB8818127: AMD devices break renderable textures on a placement MTLHeap
FB8820367: AMD devices' sampling from array textures with min_lod_clamp and bias is broken
FB8820550: AMD devices' texture gather on integer textures doesn't gather
FB8820934: AMD devices generate fragments at wrong positions on stencil textures of height 1
FB8884642: AMD devices incorrectly write stencil fragments on slices relative to base texture instead of texture view
- Apple claimed to have fixed this one in 11.1, but AFAIK it's still present.
FB8885679: AMD devices incorrectly broadcast depth values to all channels when sampled (also applies to M1)
FB8886051: AMD devices randomly fail to load pieces of a Float32 multisampled texture when drawing
FB8886252: AMD devices always return true from simd_is_helper_thread()
FB8887081: AMD devices don't implement min/max resolve of Depth16 MS textures properly
FB8887136: AMD devices do not write depth correctly with inverted depth ranges
FB8887260: AMD devices have strange behavior around multiple simd_broadcast() calls

For Intel devices:

FB8889492: Intel devices don't resolve all slices in a multisampled array texture
FB8889778: Intel devices don't clear all slices in an array stencil texture
FB8890240: Intel devices use wrong border colors for integral formats
FB8891179: Intel devices sample wrong cube face with gradient operand
FB9022749: Intel devices handle negative viewport height incorrectly
- This was actually discovered by Ethan Lee; I verified that it was related to negative viewport height and that it only occurs on Intel.
FB9095683: Setting a viewport with a negative offset and flipped causes no fragments to be drawn
- I think this one is related to FB9022749. It was discovered and filed by my collegaue @js6i.

There's also one that applies to both Intel and AMD, but not M1:

FB8885930: Metal samples textures incorrectly on the right edge

billhollings commented 3 years ago

@cdavis5e Thanks for posting that list. It will help me streamline what to report as I work through CTS failures, particularly on AMD.

Disappointing that their premium gaming GPU (until M1 catches on) is not behaving well.

[edit]

I think I've just hit FB8820934 with dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.depth_stencil.1d.s8_uint_s8_uint.optimal_optimal_nearest.

kvark commented 3 years ago

The point here is that we seem to be in the realm of arbitrary GPU-specific edge cases, and it becomes really difficult to work around all the combinations within the context of Vulkan capabilities that we can enable or disable with flags.

I don't think this can be solved in general. A Vulkan Portability implementation is expected to take into account the known issues with low level drivers, just like a Vulkan driver knows about the quirks of particular GPUs it targets. Based on this, you'd adjust the set of exposed formats/features at the initialization time.

lexaknyazev commented 2 years ago

BC7_RGBAUnorm & BC7_RGBAUnorm_sRGB reserved Mode 8 (zeroed low byte) returns alpha 1.0 instead of required 0.0

While the specs suggest all-zero result, they do not require it.

KDFS:

Encoding the low byte as zero is reserved and should not be used when encoding a BPTC texture; hardware decoders processing a texel block with a low byte of 0 should return 0 for all channels of all texels.

D3D11:

Mode 8 (LSB 0x00) is reserved and should not be used by the encoder. If this mode is given to the hardware, an all 0 block will be returned.

billhollings commented 2 years ago

While the specs suggest all-zero result, they do not require it.

I'm not sure I understand your point. The KDFS and D2D11 quotes you provide are actually very clear that 0 will (or should) be returned. And the Vulkan spec clarifies that it requires the KDFS requirements.

While neither of these indicate a strict must requirement, the D3D11 will requirement, in particular, is pretty clear, any GPU built for D3D11 would follow this to be compliant, and Apple GPUs really should do so as well, for cross-platform compliance.

Are you thinking that we should modify CTS to not enforce this requirement, under the argument that the language is not a strict must requirement?

lexaknyazev commented 2 years ago

Strictly speaking, neither D3D Functional Spec, nor KDFS are applicable to Metal conformance, especially wrt reserved inputs. On top of that, Apple's OpenGL drivers never exposed BC7 even on capable hardware.

We had the same issue in ANGLE and decided to simply avoid testing BC7 texture sampling with LSB 0x00 as it does not serve any practical purpose anyway.

FWIW, VK-GL-CTS already has some exceptions wrt invalid / reserved compressed payloads.

billhollings commented 2 years ago

We had the same issue in ANGLE and decided to simply avoid testing BC7 texture sampling with LSB 0x00 as it does not serve any practical purpose anyway.

FWIW, VK-GL-CTS already has some exceptions wrt invalid / reserved compressed payloads.

Added GitLab 3501 to remove testing alpha value behavior under Mode 8 under Vulkan.

KhronosGroup / Vulkan-Portability

CTS issues with platform-specific drivers #26