TIBCOSoftware / genxdm

GenXDM: XQuery/XPath Data Model API, bridges, and processors for tree-model neutral access to XML.
http://www.genxdm.org/
9 stars 4 forks source link

Enable associating schema type information with existing <N> tree or subtree via (re)validation #101

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 9 years ago
It would be nice to have cursor-based validation (particularly if we follow 
through on the cursor changes that eric's been working on). One thing that 
cursor-oriented transit of a document does is to inherently tie the source of 
events (in ContentHandler or SequenceHandler) to the node of origin.

The javadoc for validate suggests that it can return the same tree supplied, 
only with type annotations and possibly typed values. In practice, it's not 
really possible to do this when attaching Model.stream() to a ContentHandler or 
SequenceHandler, because it's too difficult to find the <N>ode associated with 
each event's origin. With Cursor, state is carried--it's *easy*. And we can 
leave it up to each bridge to decide whether to modify an existing tree or 
create a new one.

Original issue reported on code.google.com by aale...@gmail.com on 20 Sep 2012 at 7:09

GoogleCodeExporter commented 9 years ago
Clarifying:

Our current validation API advertises itself as able to modify the supplied 
tree, but in fact it cannot. There are no APIs to push type names onto an 
existing tree, or to push typed values in in place of text values (where 
supported). The only way to do so is via SequenceBuilder, which creates a tree.

Unfortunately, it seems that the only way to enable this is to provide a 
'typed-mutable' extension of some sort. This would provide "setType(QName)" for 
element and attribute nodes, and "setTypedValue(List<? extends A>)" for 
attribute and text nodes. This is unfortunate because bridges that support 
typing do not necessarily support typed values, and because users are unlikely 
to understand this. Call "setTypedValue(atomlist)", then getTypedValue(), and 
get back a string instead: instant bug report.

The only alternative is to push validation logic into the bridges, where 
mutation of these type names and typed values could be controlled. This is not 
really much different from the dependency that most bridges have on the 
input-output processor. But this also has issues; an external processor still 
must have an API for writing to the nodes; only the bridge can supply this.

A possible solution: a 'validator' interface that is really more of a 
typed-query interface: "here is a node, what is its type?" "here is a node, 
what is its value"? I do not think that this is particularly robust, though.

Original comment by aale...@gmail.com on 17 Oct 2012 at 8:46

GoogleCodeExporter commented 9 years ago
Updated summary to more accurately reflect intent, based on email discussion.

Having pondered this further, I see two possible ways to solve the problem in 
the context of "nodes" (not considering the cursor problem in this comment).

1) As Amy mentions, a "typed-mutable" extension. It occurs to me that a slight 
variation on this might make sense - a "type-value assignment" interface. That 
way, a client that really only cares about the "type" information of one or two 
nodes can provide its own implementation (rather than having the bridge augment 
the entire tree), and capture the type information outside of the existing 
GenXDM APIs. Since we also want the bridges to support augmentation, the 
bridges could either implement this interface in the right place, or provide an 
accessor to get it, perhaps via the ProcessingContext.

2) An alternate to the "ContentHandler" interface that specifically returns 
type in some fashion. For example, an "endElement" equivalent that returns type 
information (instead of void), as well as possibly including value information. 
Also include analogous modifications for other methods like "text", and 
"attribute".

Original comment by eric%tib...@gtempaccount.com on 19 Oct 2012 at 5:28

GoogleCodeExporter commented 9 years ago
Potentially large task. Deferred.

Original comment by aale...@gmail.com on 24 Oct 2013 at 5:26

aalewis-tibco commented 8 years ago

Further information:

We can do in-tree validation using the current ContentGenerator-based validation method, without requiring that we drop the use of SequenceHandler. We simply have to have the same object implement both ContentGenerator and SequenceHandler (or SequenceBuilder, which is prolly the validator's signature).

To do this, we need a new Feature, prolly something-something-validation-in-tree (prolly a sub-pattern of typed support). When a bridge advertises this feature as supported, a new method on TypedContext, something like newTreeValidator(N) would supply a pre-positioned ContentGenerator that also acts as a SequenceHandler/Builder.

In addition to the TreeValidator, a bridge supporting the new functionality would also support new APIs, or would use internal functionality allowing it to set the annotation and the typed value for each node.

Or something like that.

aalewis-tibco commented 8 years ago

Oops. This is done, as of 1.2.2 (I think?), when we provided in-tree validation. That didn't change any interfaces (making it a really cool trick, you know), so I overlooked revisiting and closing this.

If a bridge supports in-tree validation via the cursor-argument validate() method, it also supports partial-tree validation.