Support per-segment VQA score reporting

wilaw commented 1 month ago

Visual Quality Assurance (VQA) issues numerical scores for the user-perceived quality of an endcoded video segment. Multiple different types of scoring exists - VMAF, PSNR, SSIM etc.

The proposal is for the client to be able to report per-segment VQA scores as CMCD data. It may receive the data from a playlist. manifest, or via CMCSD headers. For symmetry, it would be convenient if the extensions being proposed for CMSD (which Akamai is submitting to the SVTA Metadata WorkGroup) could be mirrored in CMCD

Description	Key name	Header Name	Type & Unit	Value Definition
VQA type	vqat	CMSD-Static	String or List of Inner Lists of Strings	A string defining the type of VQA metric being reported. The allowed types are defined by the "Field name" column of table 2. VQA types are case-sensitive. If multiple VQA metrics are being conveyed, they are represented as an Inner List. Inner Lists are denoted by surrounding parentheses (Unicode 0x28 and Unicode 0x29), and their values are delimited by one or more spaces (Unicode 0x20).
VQA score	vqas	CMSD-Static	Integer or List of Inner Lists of integers	A number carrying the value of the VQA metric. The metric may be reported as an aggregate over the segment, in which case a single value exists per type. The metric may also be reported per-GOP (Group of Pictures), in which case multiple values will be reported in a list, each value applying to a GOP. If multiple types are declared, along with multiple GOPs, then the number of metrics declared per type MUST be identical and the position of the metrics in the list MUST match that of types to which they refer. Lists are denoted by surrounding parentheses (Unicode 0x28 and Unicode 0x29), and their values are delimited by one or more spaces (Unicode 0x20).

The processing and structure of these CMSD keys is defined by [3].

VQA scores for multiple GOPs will be listed in the ‘vqas’ value as an Inner List in accordance with RFC8941[5]. Parsers can determine if they are extracting a single or multiple values by the presence of list parentheses. Where a video segment contains multiple GOPs, the VQA score for each GOP will be reported within the list parenthesis, for example

vqat="VMAF",vqas=(81 83)

VQA scores for multiple types also result in the ‘vqas’ value being an Inner List, in which each list item references the ‘vqat’ type with the same ordinal list position. In this example, only a single score is reported for each type.

vqat=("VMAF" "PSNR"),vqas=(81 38)

Multiple types along with multiple GOPs may also be reported. In this case Inner Lists are used for both the ‘vqas’ and ‘vqas’ fields. The number of GOPs referenced for each type needs to be identical and the order of the types needs to match the order of the GOP scores.

vqat=("VMAF" "PSNR"),vqas=(81 83 38 39)

Table 2. contains a list of all the field values that can be used with the ‘vqat’ CMSD key. Field values defined in this table are case-sensitive. All scores are normalized to be integers between [0..100		0..60] to avoid precision ambiguities. Table 2. VQA Types Description S.No.	Field name
1	VMAF	Integer [0..100]	The VMAF score for a standard profile, as defined by [6].
2	VMAFMobile	Integer [0..100]	The VMAF score for a mobile profile, as defined by [7].
3	VMAFUHD	Integer [0..100]	The VMAF score for a UHD profile, as defined by [8].
4	VMAFHD	Integer [0..100]	The VMAF score for an HD profile, as defined by [7].
5	PSNR	Integer [0..60]	The PSNR score for a standard profile, as defined by [9].
6	SSIM	Integer [0..100]	The SSIM score [10] multiplied by 100 and rounded to the nearest integer.
7	ATEME VQA Metrics	Integer [0..100]	ATEME proprietary VQA model
8	EQMnr	Integer [0..100]	AWS Elemental proprietary VQA model - EQM NR
9	EQMfr	Integer [0..100]	AWS Elemental proprietary VQA model -EQM FR
10	Bitmovin VQA Metrics	Integer [0..100]	Bitmovin proprietary VQA model
11	VMAFDRE	Integer [0..100]	Harmonic proprietary VQA model using DRE
12	XVSnr	Integer [0..100]	IMAX proprietary VQA model - NR XVS
13	XVSfr	Integer [0..100]	IMAX proprietary VQA model - FR XVS using EPS
14	XVSeps	Integer [0..100]	IMAX proprietary VQA model - XEPS
15	XVSbs	Integer [0..100]	IMAX proprietary VQA model - Banding Score
16	XVScvs	Integer [0..100]	IMAX proprietary VQA model - CVS
17	pVMAF	Integer [0..100]	Synamedia proprietary VQA model pVMAF

gwendalsimon commented 1 month ago

Good idea! It can help the CDN server (or the endpoint of a CMCD log/beacon) to build an even better estimation of the QoE at the client side.

slhck commented 1 month ago

This is a good idea.

May I ask where this list of metrics come from? There are some references in the text but I don't see the actual references themselves.

This approach is similar to what has been proposed at VQEG a while ago (video quality metadata carriage, see this doc), but unfortunately was never followed up on.

A challenge is that these types of video quality metrics are usually calculated per frame and not per GOP, and that a proper aggregation would have to be found (and communicated as metadata!). Also, most of these metrics seem to be full reference metrics and have to be calculated against a known reference at a given resolution for a particular (assumed) viewing distance. These are not always set to 1080p and 1.5 H (display heights). So a metric score without knowing these data is inconclusive. At least for third parties consuming such CMCD messages that context would be missing. (For a provider using their own metrics it can be assumed they know how they calculated them. But then, the metrics will probably already be known by the time of encoding, and don't need to be round-tripped to the client and back?)

ushi-vqa commented 1 month ago

Hello @slhck We are working with the codec companies on their proprietary VQ metrics and updating the list.

The requirement here is for the streaming encoder to output the average VQA score of a segment. The scoring system cannot be standardized for the encoding industry since most of them have proprietary models. The idea is about measuring the video quality experienced by a viewer in a unique session. A 'Client-VQA' SaaS (attached below) or 3rd party analytics can have advanced information on how to interpret the metrics system of different VQA models after receiving the per-segment VQA score played out by the player and present analytics understood by the user/customer. The variation of video quality during a session due to ABR streaming, ad-insertion and personalization would offer client side analytics for video quality assurance.

This is documented in detail in the google doc "Proposal for CMSD Client-VQA keys" which we (Will and I) will be presenting in SVTA soon (Public access: https://docs.google.com/document/d/1ncNNitIY-1mpVoWASM_BVe8LL1WUAGU0lUqN2latKpw/edit)

Client-VQA 1 pager.pdf

slhck commented 1 month ago

Thanks for the background info!

I believe that more metadata would be required for such metrics to make sense, especially for third parties that do not know how the metrics were calculated. At least in the case of VMAF, you'd have to know the reference resolution (4K, 1080p?), its model feature settings (e.g. NEG on or off), etc. Of course you could compare quality in relative terms that way, from within the same system. Looking forward to read more about the proposal!

ushi-vqa commented 1 month ago

Thanks for your query @slhck

In case of VMAF, the profile name in the header field name will denote the reference resolution, eg: VMAFUHD, VMAFHD. This information becomes particularly important when evaluating the encoder or a compression algorithm.

However, to understand a viewer's QoE by VQA, which is influenced by the ABR player, ad-insertion, and personalization, the information from the client side is more valuable. Most of the video segment details, such as resolution, codec, bitrate, etc., can be extracted from the manifest file and linked to the session ID and the per-segment-VQA score headers by the Client-VQA SaaS.

It is important to note that for VQA, the reference resolution will match the output resolution, which will be noted in the manifest file for each segment (irrespective of the original/raw video resolution as reference video would have been upscaled/downscaled).

cta-wave / common-media-client-data

Support per-segment VQA score reporting #131