KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.8k stars 468 forks source link

h264 encode: `expectDyadicTemporalLayerPattern` #2316

Open colinmarc opened 8 months ago

colinmarc commented 8 months ago

I'm a little confused about the field expectDyadicTemporalLayerPattern on VkVideoEncodeH264CapabilitiesKHR. Not much is written about it:

expectDyadicTemporalLayerPattern indicates that the implementation’s rate control algorithms expect the application to use a dyadic temporal layer pattern when encoding multiple temporal layers.

I haven't seen this flag set on any GPUs I've tested with. And it's not completely clear what the implications of it are, or the context with which it was added.

aqnuep commented 8 months ago

The implementation's rate control algorithms are often optimized for specific GOP patterns and thus may do a better or worse job in allocating appropriate amount of bitrate across frames depending on the pattern the application uses.

This also applies to the temporal layer pattern, often referred to as the temporal GOP.

The structure of a dyadic temporal layer pattern is described in the spec formally and the diagram shows an example with 4 temporal layers.

When the expectDyadicTemporalLayerPattern cap is set, then it indicates that the implementation's rate control algorithm works best if the application follows such a dyadic temporal layer pattern (by encoding subsequent frames in a temporal GOP structure as outlined in the spec with corresponding temporal layer IDs and reference patterns, and specifying the VK_VIDEO_ENCODE_H264_RATE_CONTROL_REFERENCE_PATTERN_DYADIC_BIT_KHR rate control hint flag).

If not set, it indicates that the implementation's rate control algorithm is not optimized for this pattern, thus using a dyadic temporal layer pattern is unlikely to result in better rate control bitrate budget allocation compared to other temporal layer patterns.

The temporal layer ID of a frame is specified in StdVideoEncodeH264PictureInfo::temporal_id for H.264 encode.