Pittsburgh-NEH-Institute / pr-app

eXist-db app development
MIT License
3 stars 1 forks source link

Create XHTML output from eXist-db #11

Closed djbpitt closed 3 years ago

djbpitt commented 3 years ago

Your to-do

Create XHTML output from eXist-db with XML declaration, doctype declaration, namespace, and media-type

Issues you may encounter

Additional context

djbpitt commented 3 years ago

Declare HTML namespace on root element and add:

declare option exist:serialize "method=xhtml5 indent=yes html-version=5.0 media-type=application/xhtml+xml";

Not documented officially, but see https://markmail.org/message/4ubfxqyeq2rp3tdw for discussion.

djbpitt commented 3 years ago

David's Slack posting in the eXist-db workshop on 2021-10-28, with responses

(Using the 2021-10-22 nightly build of 5.4)

If I have understood the correspondence at https://markmail.org/message/4ubfxqyeq2rp3tdw correctly, the only way to ask eXist-db to serve 1) XHTML5 with XML syntax and, 2) a doctype declaration that looks like <!DOCTYPE html>, 3) the application/xhtml+xml mime type, 4) an XML declaration, and 5) the HTML namespace is to declare the namespace on the root <html> element (this much is expected) and use a legacy declaration:

declare option exist:serialize "method=xhtml5 indent=yes html-version=5.0 media-type=application/xhtml+xml omit-xml-declaration=no";

I do not need to save the result into the database; I just need to access the XQuery and return the result to a browser. As far as I’ve been able to tell from the available documentation, it is not possible to return results like this with the non-legacy method of declaring options, although the correspondence I cite above seems to suggest that it should be possible by specifying the method as xhtml (not xhtml5) together with an html-version of 5.0 and no public or system doctype. Perhaps more importantly, I cannot find any documentation (I did a full-text search for xhtml5 at http://exist-db.org/exist/apps/doc/search.html?q=xhtml5) for how to obtain this result using the legacy method. I think expected behavior is to be able to get output according to the five features listed above using the non-legacy method, which I guess might count as a feature request. But if I am correct in thinking that the legacy method is available and not scheduled for removal, but also not documented, that would seem to be a documentation error. Is there something constructive that I can do (I’m not a Java programmer) to help clear up the confusion? Or is this my misunderstanding, rather than a real issue?


8 replies


line0:seedling: 1 day ago

I am sorry @David, can you cite something that xhtml5 is a thing?


David 1 day ago

@line0 I don’t think xhtml5 is a thing. What I want is HTML5 (which is a thing) with XML syntax (also a thing) and the other features I described (doctype declaration, mime type, XML declaration—all things), but the only way I’ve been able to find to get that combination of features is to ask for a method called xhtml5. As far as I can tell, there isn’t supposed to be such a method and I should be able to get the combination of features I describe by specifying xhtml (which is a documented method) and an html-version of 5.0. But that doesn’t seem to work. Have I misunderstood?


line0:seedling: 9 hours ago

@David I just talked to @Joern Turner and we both agreed that while you can have well-formed html5 you are likely not allowed to add an XML-declaration at the beginning. And application/xhtml+xml is also not compliant (actually never was interpreted correctly by any client I know of). (edited)


Tom Hillman 8 hours ago

Sorry to contradict, @line0, but you can certainly have both the XML declaration and the HTML5 doctype.


Tom Hillman 8 hours ago

Although I notice that the W3C validator erroneously identifies these as 'XML processing instructions' e.g. https://validator.w3.org/nu/?doc=https%3A%2F%2Fyamahito.github.io%2FSyrinscall%2Fdarkly.html


Tom Hillman 8 hours ago

I also note that the validator in question complains about some features necessary for XML compatibility such as "stray end tags"


David 7 hours ago

@line0 @Joern Turner Thank you; this is very helpful information. If you don’t mind, I’d be grateful if we could please explore some of the features that I mentioned in more detail:

  1. Mime type: You write that “And application/xhtml+xml is also not compliant (actually never was interpreted correctly by any client I know of).” Meanwhile, according to https://html.spec.whatwg.org/#html-vs-xhtml, “The second concrete syntax is XML. When a document is transmitted with an XML MIME type, such as application/xhtml+xml, then it is treated as an XML document by web browsers, to be parsed by an XML processor.” If I am correct in thinking that the spec that I am quoting is authoritative, I don’t know how to understand what you mean when you write that it is “”not compliant”. I also don’t understand what you mean when you write that this mime type was never interpreted correctly by any client that you know of. When I use the serialization declaration that I mention at the beginning of this thread and examine the mime type of what eXist-db serves to my installation of Chrome, having opened the network view in the Chrome debugging tools, the mime type identified by the browser matches the type I declare in the XQuery. It is also among the mime types that the browser lists as accept in the request headers. I think this combination should mean both that the mime type is conformant with the spec and that at least the current version of Chrome (under MacOS) interprets it correctly.
  2. XML declaration: According to https://html.spec.whatwg.org/#charset, the XML declaration is the correct way to declare a character set when using XML syntax. If I have understood that part of the spec correctly, that means that the XML declaration is allowed when serving HTML5 with XML syntax. I see that Tom wrote, as I was composing this message, that he also thought that including the XML declaration when serving HTML5 with XML syntax was conformant with the spec.
  3. Doctype declaration: According to https://html.spec.whatwg.org/#writing-xhtml-documents, “XML documents may contain a DOCTYPE if desired, but this is not required to conform to this specification.” I think this tells me that the doctype declaration is not incorrect. According to https://developer.mozilla.org/en-US/docs/Web/HTML/Quirks_Mode_and_Standards_Mode, serving HTML5 with XML syntax does not require a doctype declaration as long as the mime type is application/xhtml+xml. The MDN page is not a spec, to be sure, but I don’t see anything anywhere in the spec or elsewhere that suggests that using the doctype declaration with XML syntax is incorrect or deprecated.
  4. Namespace: You don’t mention this explicitly, but I think we are in agreement that HTML5 with XML syntax should be served with the correct HTML namespace. When I omit the namespace but include the XML declaration, Chrome and Firefox do not recognize the document as HTML. This is the behavior I expect, since the XML syntax is namespace aware.

My original question was “how do I do X?“, which tacitly presupposes that it is reasonable for me to want to do X, so a response that “X is incorrect and you shouldn’t want to do it” is constructive and helpful. It is for that reason that I am now trying to unpack my original question (and its underlying, tacit assumptions) into its constituent pieces, so that if I’ve misunderstood either what I should want to do or how to do it, we’ll be able to identify in a more granular where I’ve gone astray:

  1. Am I doing something incorrect in wanting the doctype declaration, specified mime type, namespace, and XML declaration? You seem to write that I am, while, as I read the spec, I am not, but I have less experience with reading specs than you do, so if I’ve misunderstood, I’d be grateful for correction about the part of the spec that I’ve misunderstood. I’ve tried to help with this part of the conversation by citing the spec where I can, so that we don’t wind up disagreeing about whether something is conformant without providing documentation to support our understanding, whatever that may be.
  2. If I am not incorrect in wanting those features, my next question is whether eXist-db can support that serialization without my specifying a method serialization value of xhtml5. I did not mean to suggest that xhtml5 is a valid value for the serialization method ; I found it mentioned in the correspondence (now several years old) between Martin and Wolfgang that I cite above, which is why I thought it would work (and it does), but one point of my question was whether there was an alternative way to get the serialization I think I want, an alternative that does not use an apparently non-standard method value.
  3. The third part of my question involved the legacy vs newer syntax for serialization declarations. I shared an example of something that works, and that I thought Wolfgang was recommending in his correspondence with Martin, that uses the legacy syntax. Since the method value does not appear to be conformant, I would not expect it to work with the newer serialization syntax. But if I am correct in thinking that the serialization that I want is legitimate, I would expect it to be supported with a valid method value using the newer serialization syntax, and not just the legacy syntax.
  4. Finally, if the serialization that I think I want is legitimate, I would expect the eXist-db documentation to document it—ideally how to do it with the newer serialization declaration syntax.

David 5 hours ago

One more reference: the only method values listed in https://www.w3.org/TR/xslt-xquery-serialization-31/#xml-output seem to be xml, xhtml, html, text, json, and adaptive. That matches the methods supported by <xsl:output> in Saxon (https://www.saxonica.com/documentation11/index.html#!xsl-elements/output).

djbpitt commented 3 years ago

The following serialization strategy, combined with declaring the HTML namespace on the root <html> element, meets all of our requirements:

declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "xhtml";
declare option output:media-type "application/xhtml+xml";
declare option output:omit-xml-declaration "no";
declare option output:html-version "5.0";

Note that the method must be xhtml, and not html. With html empty elements are created as unmatched start-tags (e.g., <br>), instead of self-closing empty tags (e.g., <br />). Specifying a method of xhtml ensures XML-compliant representation of empty elements.