Closed Dclipsham closed 1 year ago
Taking a look at the original request and the sample files we have plus the documentation you are correct the signature could be tighter. I have put in what you suggested and will upload a signature for you to test out later today. The issue with the relationship was a copying error, thank you for letting us know! If the new signature looks good then I'll close this issue
Hi David, Do the changes in v.114 mean we can now close this request?
Believe this has now been resolved- if not we can re-open at a later date
The attached files are SVG 1.1, but are getting dual identification with fmt/1776 - XML 1.1. They are not XML 1.1, they are XML 1.0, but the liberal signature for XML 1.1 is picking up the 'svg version=1.1' string and interpreting that as part of the XML version identifier. The files are not additionally getting identification as fmt/101 XML 1.0 because fmt/92 SVG 1.1 has priority over fmt/101.
unavail.svg up.svg wait.svg
Currently the XML 1.1 signature is 3C3F786D6C(20|09){0-30}76657273696F6E{0-30}3D{0-30}(22|27)312E31(22|27). This translates as:
<?xml([space]|[horizontal tab]){0-30}version{0-30}={0-30}('|")1.1('|")
Regarding the three {0-30} wildcard blocks: The XML 1.1 specification, section 2.8 (https://www.w3.org/TR/2006/REC-xml11-20060816/#sec-prolog-dtd) states "XML 1.1 documents must begin with an XML declaration which specifies the version of XML being used"
The specification also states that the VersionInfo part of the XML declaration of the prolog is as follows:
VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
and that 'Eq' can have whitespace characters either side of the equals character:Eq ::= S? '=' S?
Although white space in the XML 1.1 specification is allowed to consist of one or more space/tab characters, in practice for a string like the XML declaration, you're very unlikely to see something like
<?xml version = "1.1"
, and the main use for liberal whitespacing in XML is to deal with indentation (e.g. pretty print styles). You're also unlikely to see a tab character between 'xml' and 'version' but I'm more relaxed about that one...The prolog does allow for miscellaneous comments, but these appear after the XML declaration, not within or before it.
If there are file examples available where the version declaration does appear later on, I'm curious to see if this is because of a specifc subtype/interpretation of XML 1.1 that doesn't adhere to the formal standard?
Otherwise, I would recommend the sequence is amended to the following: 3C3F786D6C(20|09)76657273696F6E{0-1}3D{0-1}(22|27)312E31(22|27)
Further, fmt/1776 has been given a subtype relationship against fmt/102 Extensible Hypertext Markup Language 1.0. This is incorrect, nor is it a subtype of fmt/101 XML 1.0. I believe the intended relationship here should be 'Is subsequent version of [fmt/101] Extensible Markup Language 1.0'
Many thanks, David