Open eyelidlessness opened 2 months ago
One thing worth considering as a followup of Option 1: it might be nice to move all of the XForms-specific XPath behavior from @getodk/xpath
into @getodk/xforms-engine
. That is way more churn than I'd want to deal with for external secondary instances, and I'd want to put more design thought into it. But I think it makes a lot of sense to reconsider that boundary afterward.
My main motivation for that is we can make far more XForms-specific assumptions in the xf
/jr
function implementations if we can also make assumptions about the adapter implementation. This occurred to me while updating the README in #204, which got me wondering about jr:choice-name
and how we should support it. It just really makes sense to be implemented in the engine, with knowledge about the engine's nodes and their state. There are other things that make more sense that way too. Since jr:choice-name
mentions active language, that's state that makes much more sense if it comes from the engine rather than being the awkward dance between the engine and xpath
that it is now. There are aspects of #195 that we couldn't reasonably do without more engine knowledge (i.e. reject node-set arguments that don't reference actual repeats).
I'm sure I could go on and on if I spend more time thinking about doors this change would open.
Capturing discussions from yesterday: we've decided now is a good time to move forward with Option 1! 🎉
There was also some discussion of:
Potentially breaking up some of the work. One of the prerequisites for Option 1—separate trees for secondary instances, itext, primary instance—also implies the possibility of mixing adapters, where e.g. the primary instance may be evaluated against the engine's node representation(s), while secondary instances and itext could (at least temporarily) continue to be evaluated with the WHATWG DOM adapter introduced in Option 1. I think this is at least a compelling approach to taming scope, of a change which will already be a pretty large refactor.
Whether we may benefit from leveraging browsers' native XPath evaluator for those secondary instance/itext cases. This was raised yesterday by @lognaturel, citing significant performance improvement in Enketo (enketo/enketo#1342).
On (2), I am skeptical that we'd benefit as much from this as Enketo does. Specifically, I'm skeptical that we'd benefit as much as we would from the kinds of optimizations we may achieve with a static node representation (or possibly even multiple representations) of our own design.
However! Since this question also gets to our current most significant performance issue (itemsets/choice filters), it is definitely worth thinking about and maybe a spike. To make it work, we'd also need a solution for substituting values of primary instance nodes where they're referenced in instance()
expressions.
My main hesitation on (1) is rooted in that same skepticism. We tend to be conservative about taking on refactors. This is a good thing for project stability and progress! However, I worry that by splitting this one, which has clear buy-in and obvious potential far reaching for long term benefits, we may defer realizing many of those benefits indefinitely.
This design issue is part of broader support for external secondary instances:
jr:
URLsThis issue is mostly focused on engine internals—including aspects of both
@getodk/xforms-engine
itself, and its integration with@getodk/xpath
.We may consider minor, optional changes to the engine/client API as a part of one of the approaches described here. While optional, there may be some compelling reasons to consider them anyway (possibly overlapping with design around error conditions).
New concepts
Supporting external secondary instances introduces two concepts that don't currently exist in the Engine/XPath portion of the Web Forms stack. Rather, two facets of the same concept:
The concept unifying both is that the data can be referenced by a form's XPath computations, and is treated by those computations as if it...
Once we've addressed the I/O and error condition aspects of this functionality, the main requirement of supporting external secondary instances is to satisfy those facets.
Option 0: negate the external and non-XML facets of the problem
This is, more or less, the Enketo option:
Parses and processes supported non-XML formats into an XML/DOM representation suitable for expected XPath evaluation
Inserts that representation into the form as if it were an internal secondary instance
This is a cromulent, naive, solution! However, I have some concerns with it.
First and foremost, it is likely to exacerbate the worst performance challenges we currently face. Second, it would squander an excellent opportunity to begin addressing that problem with a solution frequently at the center of many other goals.
Option 1: evolve
@getodk/xpath
to support these conceptsThis option is... to decouple
@getodk/xpath
from the browser/XML DOM (#39)!I've prototyped a solution to this that I think is actionable, suitable for the feature, and likely to deliver on other benefits I'd hoped for.
In very broad strokes, the solution introduces an adapter model for XPath evaluation of arbitrary document/node implementations. The implementation an adapter handles is opaque to the
@getodk/xpath
package: all interaction with the implementation goes through calls to functions defined by the provided adapter.Below, I'll go over a representative subset of the adapter interface I have in mind.
Quick terminology switcheroo
I've frequently used the terms "browser DOM", "XML DOM" or even "the DOM" somewhat loosely and interchangeably. These are all consistent with our colloquially understanding of the thing being referenced, but using them places some limitations on how we can discuss other aspects of this design proposal.
XPath semantically operates on a document object model, without specifying anything about its underlying API or implementation. Anything we do to decouple from "the [browser and/or XML] DOM" will still necessarily be implemented against a document object model—in this case, one of our own evolving design.
I think it will be clarifying for the rest of this writeup to refer to "the [browser and/or XML] DOM" as the WHATWG DOM—directly referencing that specific DOM standards body and specification. (Just as we tend to reference ODK XForms for clarity.)
If we move forward with this option, it's also likely we'll want to use this clarifying terminology in a WHATWG DOM adapter.
Opaque node types
At the core of the adapter design is a set of opaque node types, which can be associated with the set of semantic node kinds that XPath operates on.
Note that these types aren't entirely opaque: each node kind associates a single symbol property with the node's XPath semantic kind. This is intended to preserve type safety where we currently distinguish nodes of different kinds based on much more information. (In actual fact, it improves type safety significantly in a number of areas where the standard DOM types are difficult to work with!)
With the adapter design, adapter call sites in
@getodk/xpath
will never directly access thisXPathContextNodeKind
property—or any property!—of any node, from any implementation[^1]. The property exists only to act as a tagged union discriminator, in order to preserve valuable type checks currently used in@getodk/xpath
, as well as to help provide similar type safety to adapter implementations where specific kinds of nodes are concerned.[^1]: It would be nice to be able to demonstrate the "never directly access" claim at the type level as well, but I think we may have to settle for the otherwise-opaque nature of the types to help emphasize the idea that only the adapter really knows anything about the node implementations themselves.
Everything else in the adapter interface operates on a generic type parameter extending
XPathNode
(or some subset thereof). An adapter, as a whole, operates on the same set ofXPathNode
implementations by supplying the same union of node kind implementations to that type parameter.Before moving on to how the opaque node types are used, one more note: the node kinds described here reflect XPath semantics, which already differ from the WHATWG DOM's semantics in a couple of ways:
Attr
in the WHATWG DOMCDATASection
fromText
, while they are distinct node types in the WHATWG DOMBoth of these are cases where the adapter design improves safety around some existing WHATWG DOM operations. And in any adapter implementation, handling XPath-specific operations with XPath semantics makes certain operations much clearer and easier to understand at a glance.
High level adapter structure
(Note: some of the types below are simplified to be somewhat easier to read here.)
The adapter interface is derived from DOM operations currently used by
@getodk/xpath
, broken down into categories:I could probably be convinced that the single function in the
XPathValueAdapter
interface could instead go inXPathTraversalAdapter
. I could also be convinced the adapter interface would be better as a flat bag of functions, i.e. ...Flattened
```ts interface XPathDOMAdapterFuture, optional
XPathOptimizationsAdapter
In my prototyping, there were some DOM access patterns for which a given adapter may have optimization opportunities specific to its underlying implementation. In some cases, those optimizations were specific to the WHATWG DOM (and it'd be a shame to deoptimize that adapter).
There are of course tons of opportunities to provide optimizations specific to
@getodk/xforms-engine
. Some of these adapter-specific optimizations may be brought into scope on a per case basis, as they'll be in focus as we refactor. One likely case will be called out below as I discuss the engine adapter.Prerequisite: a WHATWG DOM adapater
We have a wealth of
@getodk/xpath
tests which will help guide this effort. To benefit from these tests, it will be necessary to build a WHATWG DOM adapter. And as mentioned earlier, current WHATWG DOM implementation details effectively guided the adapter design. Building this adapter essentially comes "for free"—it is effectively the same work as refactoring to support adapters in the first place.And while we may phase out our use of the WHATWG DOM for XPath purposes, it's still the most likely way other users would want to use the
@getodk/xpath
package.The WHATWG DOM adapter would probably be the default adapter, at least initially and for the immediate future. Any hypothetical users other than us would likely experience this change as totally transparent. It is only a breaking change for our own usage.
A few other
@getodk/xpath
implementation odds and endsThere are a few areas in
@getodk/xpath
that don't fit the adapter design well, which we'd want to refactor as a prerequisite step. These are all very narrow in scope. They mostly involve:Eliminating any remaining usage of the
TreeWalker
DOM API. (This is ready to be cherry picked from my prototype branch. The only remaining impact would be a little more review overhead.)Relaxing or eliminating certain assumptions about XForm-specific document structure, which never really felt like the best assumptions in the first place. Specifically: instead of
instance()
andjr:itext()
operating on same-document subtrees outside of the primary instance, theXFormsXPathEvaluator
entrypoint will need to accept explicit access to these subtrees—more or less as if they were analogous toDocumentFragment
s. (This isn't quite cherry pick-ready, but it's also fully working in my prototype.)Engine usage: an XForms adapter
Bringing it all together, for the engine to use this adapter design, we'll need to build an adapter to the engine's notion of a node tree. This work is trivial, relative to the underlying refactors to support an adapter in the first place.
We will, however, need to introduce and utilize a few new node types in the engine.
PrimaryInstance
(asXPathDocument
)While we already have a single root node (
Root
/RootNode
), that node is necessarily an element in XPath semantics (XPathElement
). XPathAbsoluteLocationPath
expressions expect that root element to have a parent document.We can make this node's introduction as transparent to clients as we like:
If we prefer, we can choose not to expose the node to clients at all; or
We can also take the opportunity to let it become the natural home for some of the awkwardness of the current
RootNode
interface (i.e.RootNode
becomes more like any other element, and form-global stuff gets an obvious home on a new client-facingPrimaryInstanceNode
).Further, we can consider this node a potential future mechanism to convey some error conditions which we consider partial/recoverable. We can also expand on that concept to convey details which are definitely not error conditions but have similar informational benefits for some/all clients (form design warnings, informational notices of any sort).
Representation of secondary instances, itext translations (naming TBD, let's bikeshed!)
We will need a representation (or multiple representations?) for:
<instance>
elements, theirid
attributes, and their subtrees<itext>
and its subtree<instance>
elements, theirid
attributes, and their resolved resources⭐ = this is why we're here now!
I have already been able to rapidly prototype the first two of these, enough to have high confidence in the adapter design/approach, and to have confidence that it will be a great fit to the external secondary instance use case.
In my prototyping, I was also able to confirm that this approach unlocks optimizations directly focused on these subtrees—and can do so without introducing undue complexity or undue coupling within the
@getodk/xpath
package and its implementation details.Secondary instances in particular are already, by far, our greatest performance liability, and our most obvious optimization target. The external secondary instances feature is, in my opinion, a great opportunity to set ourselves up for making these optimizations approachable and sensible.
"Free" engine complexity reduction, performance wins
My prototyping of this adapter approach has also helped to confirm certain other benefits I've been anticipating.
Most notably, I can confirm that an XPath adapter implemented to directly operate on our
@getodk/xforms-engine
node implementations eliminates most if not all of our need for complex dependency analysis and resolution.Computations update as expected without needing to traverse an instance for
SubscribableDependency
references, or a need tosubscribe()
to them. I haven't tested this yet, but I have reasonably high confidence that issues like #178 will either be solved directly by adopting the adapter approach, or be a one-time solution applicable to all node types (current or future).This will not only reduce incidental complexity, it'll also improve performance. Currently, we deliberately over-subscribe to unrelated dependencies—performing many unnecessary and, on slower forms, costly recomputations—as a sacrifice to confidence in correctness. Eliminating this aspect of the engine's internal reactive implementation will be a performance improvement for many large/complex forms, especially over the time a user spends on a form.
Options 1b/1c: other approaches to decoupling from the WHATWG DOM
I want to mention some other approaches to XPath evaluation of non-WHATWG DOM that I've either prototyped or considered. These are certainly viable options in the abstract, and may seem more obvious in the abstract. My goal in this section is to briefly capture them and explain why I felt the specific adapter approach described above is more suitable for our needs.
Identify a subset of WHATWG DOM in use by
@getodk/xpath
, implement that subset directly on@getodk/xforms-engine
nodes.Define an
XPathNode
interface from a clean slate, implement that interface either directly—on nodes we control, i.e.@getodk/xforms-engine
—or as a wrapper/box type around nodes we don't control, i.e. the WHATWG DOM.[^2]The first alternative is compelling because it minimizes changes to
@getodk/xpath
, both its API and internals.The second alternative is compelling because it allows us to define an ideal API for abstract DOM interaction and say "this is what it means to be an XPath node of this kind, with all of its XPath semantics built into the node itself".
Both have a downside of imposing a fairly large surface area onto the implementation of nodes themselves. That surface area makes it hard to reason about where satisfying XPath semantics ends and where the nodes other concerns begin. It also has a very high likelihood of conflicting with existing node APIs with different meaning/implications (both in the WHATWG DOM and on
@getodk/xforms-engine
nodes).The first has these additional downsides:
Option 1 as proposed above basically basically applies inversion of control to the second option: we define the APIs we want specifically to suit XPath semantics, and then isolate those semantics to functions operating on arbitrary nodes which themselves don't need to know about those XPath semantics (unless it would also be beneficial for them to do so).
[^2]: A variant of this which I explored and ruled out is a
Proxy
-based approach to wrapping WHATWG DOM nodes. This was infeasible becauseProxy
and WHATWG DOM implementations just have fundamental frictions, and in some cases just don't work for our needs at all.