MPEGGroup / DASHSchema

The XML schema and example XML files for DASH (ISO/IEC 23009-1)
Other
11 stars 16 forks source link

MPD@profiles string fails with 4th edition #46

Closed eceozturk closed 4 years ago

eceozturk commented 4 years ago

MPD@profiles string in most of the DASH test vectors that are available at the link fails when tested with the 4th edition DASH schema. Further information is provided below.

Error message:

“Line:Col[8:20]:cvc-pattern-valid: Value 'urn:mpeg:dash:profile:isoff-live:2011,http://dashif.org/guidelines/dash264' is not facet-valid with respect to pattern '(([A-Za-z0-9\-\._~?-?]|(%[0-9A-Fa-f][0-9A-Fa-f])|[!$&'()*+;=:@]))+(,(([A-Za-z0-9\-\._~?-?]|(%[0-9A-Fa-f][0-9A-Fa-f])|[!$&'()*+;=:@]))*)*' for type 'ListOfProfilesType'.
Line:Col[8:20]:cvc-attribute.3: The value 'urn:mpeg:dash:profile:isoff-live:2011,http://dashif.org/guidelines/dash264' of attribute 'profiles' on element 'MPD' is not valid with respect to its type, 'ListOfProfilesType'.”

NOTE that the value of the MPD@profiles string given in the error message is just an example. It changes according to the tested MPD.

Tested DASH schema location (4th edition): https://raw.githubusercontent.com/MPEGGroup/DASHSchema/21e8bf2c973c20ad02db3df9f61bbd8759bb16f1/DASH-MPD.xsd

Testing URL: http://54.72.87.160/conformance/current/Conformance-Frontend/Conformancetest.php?schema=https://raw.githubusercontent.com/MPEGGroup/DASHSchema/21e8bf2c973c20ad02db3df9f61bbd8759bb16f1/DASH-MPD.xsd

Some example test vectors with fail status:

paulhiggs commented 4 years ago

The regular expression (([A-Za-z0-9\-\._~?-?]|(%[0-9A-Fa-f][0-9A-Fa-f])|[!$&'()*+;=:@]))+(,(([A-Za-z0-9\-\._~?-?]|(%[0-9A-Fa-f][0-9A-Fa-f])|[!$&'()*+;=:@]))*)* does not allow the slash ('/') in a profile name

paulhiggs commented 4 years ago

On another aspect, the profile name `http://dashig.org/guidelines/dash264" does not comply with the 4th edition definition of a URL profile identifier which states

When a URL is used, it should also contain a month-date in the form mmyyyy;

mikedo commented 4 years ago

Hi Paul, thanks for digging into this. Are you saying that the regex does not permit the required slash, or something else? Also, the date is a "should" and thus not conformance testable. I believe that should be removed from the regex.

paulhiggs commented 4 years ago

@mikedo correct that the regex does not permit the slash needed for a URL.

just add the necessary /s to the regex '(([A-Za-z0-9-.~?-?]|(%[0-9A-Fa-f][0-9A-Fa-f])|[!$&/'()*+;=:@]))+(,(([A-Za-z0-9-.~?-?]|(%[0-9A-Fa-f][0-9A-Fa-f])|[!$&'/()+;=:@])))*'

That said, I am not sure what this part ~?-? of the regex is supposed to match.

paulhiggs commented 4 years ago

OK, ~?-? is just a formatting anomaly from ~ -&#xFF

mikedo commented 4 years ago

Stepping back, either the regex conforms to the 4th Ed or not. If it does, then the example test vectors are non-conformant and DASH-IF would have to decide how to resolve this (change test content or propose an amendment to 4th Ed). If the regex does not conform to the 4th Ed, then it first needs to be fixed.

If @profiles were a space-separated list of profiles, then a reasonable (although not perfectly constrained) data type could be simply formed from a "list of xs:anyURI", but alas....

Below is a summary of the relevant 4th Ed provisions:

The 4th Ed clause 5 clearly constrains them to only URL syntax, yet clause 8 clearly says they are either URLs or URNs). Since the 4th Ed defines specific URN profiles then one might assume the intent was per clause 8 and the normative statements in clause 5 should be assumed to apply to the URL syntax only. Clause 8 also adds the comma separated list syntax. With that interpretation, then it is a comma separated list of either URLs as constrained in clause 5 or URNs as constrained in clause 8...

From 4th Ed, clause 5.3.1.2 for MPD@profiles:

The contents of this attribute shall conform to either the pro-simple or pro-fancy productions of IETF RFC 6381:2011, subclause 4.5, without the enclosing DQUOTE characters, i.e. including only the unencodedv or encodedv elements respectively. As profile identifier, the URI defined for the conforming Media Presentation profiles as described in Clause 8 shall be used.

RFC 6381, clause 4.5:

pro-simple := "profiles" "=" unencodedv pro-fancy := "profiles*" "=" encodedv

RFC 6381, clause 3.2:

The BNF syntax is as follows:

  codecs      := cod-simple / cod-fancy
  cod-simple  := "codecs" "=" unencodedv
  unencodedv  := id-simple / simp-list
  simp-list   := DQUOTE id-simple *( "," id-simple ) DQUOTE
  id-simple   := element
              ; "." reserved as hierarchy delimiter
  element     := 1*octet-sim
  octet-sim   := <any TOKEN character>

              ; Within a 'codecs' parameter value, "." is reserved
              ; as a hierarchy delimiter
  cod-fancy   := "codecs*" "=" encodedv
  encodedv    := fancy-sing / fancy-list
  fancy-sing  := [charset] "'" [language] "'" id-encoded
              ; Parsers MAY ignore <language>
              ; Parsers MAY support only US-ASCII and UTF-8
  fancy-list  := DQUOTE [charset] "'" [language] "'" id-list DQUOTE
              ; Parsers MAY ignore <language>
              ; Parsers MAY support only US-ASCII and UTF-8
  id-list     := id-encoded *( "," id-encoded )
  id-encoded  := encoded-elm *( "." encoded-elm )
              ; "." reserved as hierarchy delimiter
  encoded-elm := 1*octet-fancy
  octet-fancy := ext-octet / attribute-char

  DQUOTE      := %x22 ; " (double quote)

4th Ed, clause 8:

A profile has an identifier, which is a URI. The profiles with which an MPD complies are indicated in the MPD@profiles attribute as a comma‐separated list of profile identifiers. Profile identifiers defined in this document are URNs and shall conform to IETF RFC 8141.

RFC 8141, clause 2:

 namestring    = assigned-name

[ rq-components ] [ "#" f-component ] assigned-name = "urn" ":" NID ":" NSS NID = (alphanum) 030(ldh) (alphanum) ldh = alphanum / "-" NSS = pchar (pchar / "/") rq-components = [ "?+" r-component ] [ "?=" q-component ] r-component = pchar ( pchar / "/" / "?" ) q-component = pchar ( pchar / "/" / "?" ) f-component = fragment

The question mark character "?" can be used without percent-encoding inside r-components, q-components, and f-components. Other than inside those components, a "?" that is not immediately followed by "=" or "+" is not defined for URNs and SHOULD be treated as a syntax error by URN-specific parsers and other processors.

Note that RFC 8141, clause 2 has more constraints on the above.

paulhiggs commented 4 years ago

So, I read it that the schema is incorrect.

The profile name is a URI, and MPEG defined profiles use URNs to satisfy this,

Profile identifiers defined in this document are URNs and shall conform to IETF RFC 8141. The schema, as published, supports this, but then also in clause 8 Externally defined profiles may use profile identifiers that are URNs or URLs The schema, as published, does not support this.

Looking at the 3rd -> 4th edition diff, there does not seem to be any spec text changes, but the schema changed from <xs:attribute name="profiles" type="xs:string" use="required"/> to

<xs:simpleType name="ListOfProfilesType">
    <xs:restriction base="xs:string">
        <xs:pattern value="(([A-Za-z0-9\-\._~&#xA0;-&#xFF;]|(&#37;[0-9A-Fa-f][0-9A-Fa-f])|[!$&amp;'()*+;=:@]))+(,(([A-Za-z0-9\-\._~&#xA0;-&#xFF;]|(&#37;[0-9A-Fa-f][0-9A-Fa-f])|[!$&amp;'()*+;=:@]))*)*"/>
    </xs:restriction>
</xs:simpleType>
mikedo commented 4 years ago

I concur. There are several issues with this I think. I'm tempted to restore the 3rd Ed data type, at least temporarily so that users (e.g. DASH-IF) can otherwise exercise the schema against old test assets and author new MPDs right. Any concerns about me committing a PR to dev branch while we sort this out?

eceozturk commented 4 years ago

As an additional note, apart from the forward slash character (/), the number character (#) is also not included in the profile string restriction regex.

waqarz commented 4 years ago

author new MPDs right

@mikedo what do you mean by DASH-IF authoring new MPDs right? What is wrong with the MPDs? I am totally lost since there are observations in the above thread that the schema is incorrect...

paulhiggs commented 4 years ago

If @profiles were a space-separated list of profiles, then a reasonable (although not perfectly constrained) data type could be simply formed from a "list of xs:anyURI", but alas....

Agree - can't we deprecate the use of comma separated lists for items that do not have spaces (i.e. identifiers)? It seems that allowing spaces in identifiers is no longer allowed (or at least scorned upon)

sdp198 commented 4 years ago

You could deprecate it from being allowed in specifications, but I don't think you could modify it in the profiles attribute in DASH - space separated isn't even a valid option there at the moment.

mikedo commented 4 years ago

@waqarz As Paul and I have (also) concluded, there are several errors with the profiles regex, making the 4th Ed schema unusable for validating MPDs with URL profiles syntax. This forces ISO users that want URL profiles syntax (including DASH-IF and DASH-IF users) to either create their own schemas or revert to the 3rd Ed schema, neither of which are a good idea since they would enable the creation of potentially new non-conformant MPDs. I did not say DASH-IF had non-conformant MPDs. Sorry for any misunderstanding.

@eceozturk Thanks for the revised schema. Let us know when everyone is happy with it.

mikedo commented 4 years ago

From @eceozturk in m52459, there are missing '/' and '#' chars. Fixing the regex (at least for DASH-IF test assets is in the private commit here: https://github.com/eceozturk/DASHSchema/commit/3e411a603950c6ee01acb20ef2ad13583e978f2c