TIBCOSoftware / genxdm

GenXDM: XQuery/XPath Data Model API, bridges, and processors for tree-model neutral access to XML.
http://www.genxdm.org/
9 stars 4 forks source link

XML output should be potentially consistent with XSLT2/XQuery Serialization spec #20

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
http://www.w3.org/TR/xslt-xquery-serialization/

There are two sides of this:

1) we need to be able to support the XML Output Method (section 5)
2) we need to support round-tripping XML, which the XML Output Method doesn't 
permit

For (1), the spec points out that: "Host languages MAY allow users to specify 
any or all of these parameters, but they are not REQUIRED to be able to do so. 
However, the host language specification MUST specify how the value of all 
applicable parameters is to be determined."

Applicable parameters for the XML output method include:

byte-order-mark, cdata-section-elements, doctype-public, doctype-system, 
encoding, indent, media-type, method, normalization-form, omit-xml-declaration, 
standalone, undeclare-prefixes, use-character-maps, version

"Applicable" does not mean that these parameters are necessarily surfaced in 
the API.  For instance, we can specify that "doctype-public isn't settable", or 
"encoding is specified implicitly when an output stream is supplied".  
Normalization might be accomplished through a separate API (conformant with the 
serialization specification).  And so on.

Speaking very roughly, our current API effectively supports doctype-system, bom 
(implicit/external), encoding (implicit/external), and specifies fixed values 
for doctype-public (none), omit-xml-declaration (no), standalone (omit), 
version (1.0), undeclare-prefixes (no; it's a 1.1 facility), and 
use-character-maps (not allowed).

What's missing: method, cdata-section-elements, indent, media-type, 
normalization-form.  Note that it would be better to permit almost all of the 
"fixed-value" parameters to be specified in *some* fashion.

As it stands, however, our output method is custom, though it does not clearly 
state so.

This brings us to (2). The XML output method prevents round-tripping in 
particular by refusing to permit any internal subset to be output.  This means, 
for instance, that any document containing an internal subset that specifies ID 
or IDREF attributes will be read as having ID and IDREF attributes, but written 
such that those attributes are no longer of type ID and IDREF, and consequently 
read a second time without those types.  Remarkably enough, the serialization 
specification calls these trees "equivalent."  It is somewhat unlikely that 
consumers of such a document would agree.

We broke with the serialization specification (without completely knowing what 
we were doing) when we enabled round-tripping of this information (which 
happens to be rather significant to the XML Security library, for instance).  
We need to recover compatibility, but without ending up in the somewhat bizarro 
world of the XML output method.

In short: we must permit users to specify that they want to enter 
XML-output-method-bizarro-world, but:

a) should default to a different, more useful output method that permits round 
trips; and
b) should do so in such a way that identifies it as a variation

At the same time, we should enable customization of various serialization 
parameters.  The question: what's the best way of doing this?  I'm tempted to 
suggest a "SerializationParameters" interface, or, more simply, a Properties or 
Map argument passed to the document writer.

Recommended:

1) define serialization parameters as a Map<QName, Object>. This permits the 
parameters from the serialization specification, and allows other sorts of 
parameters to be specified by placing them in a different namespace.

2) define our default output method using namespace http://org.gxml.output/, 
localname "default" or "int-subset".

3) add setDefaultParameters(Map<QName, Object> parameters) to 
DocumentHandlerFactory

4) add DocumentHandler newDocumentHandler(Map<QName, Object> parameters) to 
DocumentHandlerFactory, or add this argument to the current two-argument 
newDocumentHandler overload.

5) (in a different defect) make corresponding adjustments to the input-output 
processor.

Original issue reported on code.google.com by aale...@gmail.com on 12 Oct 2010 at 4:36

GoogleCodeExporter commented 9 years ago
I've started some work on this, although it's not really up to par.

At the moment, the attempt to work out how to deal with serialization 
parameters has been added, though it's not extensible (I can do that) and it 
isn't used internally.  But it's a step forward, so marking this accepted.

Original comment by aale...@gmail.com on 4 Apr 2011 at 6:27

GoogleCodeExporter commented 9 years ago
Accepted. We *want* this in 1.0, but it may not happen. I'm gonna stick it 
there, and it may slip.

Original comment by aale...@gmail.com on 26 Jul 2012 at 7:42

GoogleCodeExporter commented 9 years ago
Defer.

Original comment by aale...@gmail.com on 24 Oct 2013 at 5:24