getodk / web-forms

ODK Web Forms enables form filling and submission editing of ODK forms in a web browser. It's coming soon! ✨
https://getodk.org
Apache License 2.0
10 stars 9 forks source link

External secondary instances: Engine representation, XPath support #203

Open eyelidlessness opened 2 months ago

eyelidlessness commented 2 months ago

This design issue is part of broader support for external secondary instances:

This issue is mostly focused on engine internals—including aspects of both @getodk/xforms-engine itself, and its integration with @getodk/xpath.

We may consider minor, optional changes to the engine/client API as a part of one of the approaches described here. While optional, there may be some compelling reasons to consider them anyway (possibly overlapping with design around error conditions).

New concepts

Supporting external secondary instances introduces two concepts that don't currently exist in the Engine/XPath portion of the Web Forms stack. Rather, two facets of the same concept:

  1. Data outside of a form definition
  2. Data formats beyond XML

The concept unifying both is that the data can be referenced by a form's XPath computations, and is treated by those computations as if it...

  1. ... were part of the form defintion
  2. ... had the same XML semantics as the form definition

Once we've addressed the I/O and error condition aspects of this functionality, the main requirement of supporting external secondary instances is to satisfy those facets.

Option 0: negate the external and non-XML facets of the problem

This is, more or less, the Enketo option:

  1. Parses and processes supported non-XML formats into an XML/DOM representation suitable for expected XPath evaluation

  2. Inserts that representation into the form as if it were an internal secondary instance

This is a cromulent, naive, solution! However, I have some concerns with it.

First and foremost, it is likely to exacerbate the worst performance challenges we currently face. Second, it would squander an excellent opportunity to begin addressing that problem with a solution frequently at the center of many other goals.

Option 1: evolve @getodk/xpath to support these concepts

This option is... to decouple @getodk/xpath from the browser/XML DOM (#39)!

I've prototyped a solution to this that I think is actionable, suitable for the feature, and likely to deliver on other benefits I'd hoped for.

In very broad strokes, the solution introduces an adapter model for XPath evaluation of arbitrary document/node implementations. The implementation an adapter handles is opaque to the @getodk/xpath package: all interaction with the implementation goes through calls to functions defined by the provided adapter.

Below, I'll go over a representative subset of the adapter interface I have in mind.

Quick terminology switcheroo

I've frequently used the terms "browser DOM", "XML DOM" or even "the DOM" somewhat loosely and interchangeably. These are all consistent with our colloquially understanding of the thing being referenced, but using them places some limitations on how we can discuss other aspects of this design proposal.

XPath semantically operates on a document object model, without specifying anything about its underlying API or implementation. Anything we do to decouple from "the [browser and/or XML] DOM" will still necessarily be implemented against a document object model—in this case, one of our own evolving design.

I think it will be clarifying for the rest of this writeup to refer to "the [browser and/or XML] DOM" as the WHATWG DOM—directly referencing that specific DOM standards body and specification. (Just as we tend to reference ODK XForms for clarity.)

If we move forward with this option, it's also likely we'll want to use this clarifying terminology in a WHATWG DOM adapter.

Opaque node types

At the core of the adapter design is a set of opaque node types, which can be associated with the set of semantic node kinds that XPath operates on.

type XPathNodeKind =
  | 'document'
  | 'doctype'
  | 'element'
  | 'namespace-declaration'
  | 'attribute'
  | 'text'
  | 'comment'
  | 'processing-instruction';

declare const XPathContextNodeKind: unique symbol;

interface XPathBaseNode<Kind extends XPathNodeKind> {
  readonly [XPathContextNodeKind]: Kind;
}

export type XPathDocument = XPathBaseNode<'document'>;
export type XPathDoctype = XPathBaseNode<'doctype'>;
export type XPathElement = XPathBaseNode<'element'>;
export type XPathNamespaceDeclaration = XPathBaseNode<'namespace-declaration'>;
export type XPathAttribute = XPathBaseNode<'attribute'>;
export type XPathText = XPathBaseNode<'text'>;
export type XPathComment = XPathBaseNode<'comment'>;
export type XPathProcessingInstruction = XPathBaseNode<'processing-instruction'>;

export type XPathNode =
  | XPathDocument
  | XPathDoctype
  | XPathElement
  | XPathNamespaceDeclaration
  | XPathAttribute
  | XPathText
  | XPathComment
  | XPathProcessingInstruction;

Note that these types aren't entirely opaque: each node kind associates a single symbol property with the node's XPath semantic kind. This is intended to preserve type safety where we currently distinguish nodes of different kinds based on much more information. (In actual fact, it improves type safety significantly in a number of areas where the standard DOM types are difficult to work with!)

With the adapter design, adapter call sites in @getodk/xpath will never directly access this XPathContextNodeKind property—or any property!—of any node, from any implementation[^1]. The property exists only to act as a tagged union discriminator, in order to preserve valuable type checks currently used in @getodk/xpath, as well as to help provide similar type safety to adapter implementations where specific kinds of nodes are concerned.

[^1]: It would be nice to be able to demonstrate the "never directly access" claim at the type level as well, but I think we may have to settle for the otherwise-opaque nature of the types to help emphasize the idea that only the adapter really knows anything about the node implementations themselves.

Everything else in the adapter interface operates on a generic type parameter extending XPathNode (or some subset thereof). An adapter, as a whole, operates on the same set of XPathNode implementations by supplying the same union of node kind implementations to that type parameter.

Before moving on to how the opaque node types are used, one more note: the node kinds described here reflect XPath semantics, which already differ from the WHATWG DOM's semantics in a couple of ways:

Both of these are cases where the adapter design improves safety around some existing WHATWG DOM operations. And in any adapter implementation, handling XPath-specific operations with XPath semantics makes certain operations much clearer and easier to understand at a glance.

High level adapter structure

(Note: some of the types below are simplified to be somewhat easier to read here.)

The adapter interface is derived from DOM operations currently used by @getodk/xpath, broken down into categories:

interface XPathDOMAdapter<T extends XPathNode> {
  /**
   * Implements predicates for each semantic node kind.
   */
  kind: XPathNodeKindAdapter<T>;

  /**
   * Implements functions to resolve node names and namespaces.
   */
  name: XPathNameAdapter<T>;

  /**
   * Implements functions to traverse documents under evaluation.
   */
  traversal: XPathTraversalAdapter<T>;

  /**
   * Implements functions to access values associated with a given node.
   */
  value: XPathValueAdapter<T>;
}

interface XPathNodeKindAdapter<T extends XPathNode> {
  /**
   * Checks whether a given value of unknown type is _any kind of node the
   * adapter handles_.
   */
  isContextNode: (value: unknown) => value is T;

  // The rest are predicates for specific node kinds...

  isDocument: (value: T) => value is Extract<T, XPathDocument>;

  // ...etc ...
}

interface XPathNameAdapter<T extends XPathNode> {
  getNamespacePrefix: (node: T & XPathNamespaceDeclaration) => string | null;
  getNamespaceURI: (node: T & (XPathAttribute | XPathElement)) => string | null;
  getPrefixedName: (node: T & (XPathAttribute | XPathElement)) => string;
  getLocalName: (node: T & (XPathAttribute | XPathElement)) => string;
  getProcessingInstructionName: (node: T & XPathProcessingInstruction) => string;
  resolveNamespaceURI: (node: T, prefix: string | null) => string | null;
}

interface XPathTraversalAdapter<T extends XPathNode> {
  getContainingDocument: (node: T) => T & XPathDocument;
  getParentNode: (node: T & XPathChildNode) => T & XPathParentNode;
  getAttributes: (node: T & XPathElement) => Iterable<T & XPathAttribute>;

  // ... and so on ... if you can express a hierarchical relationship between
  // nodes in XPath, it's either defined as a function here, or composed from
  // one or more functions defined here ...
}

interface XPathValueAdapter<T extends XPathNode> {
  getNodeValue: (node: T) => string;
}

type XPathParentNode = XPathDocument | XPathElement;

type XPathChildNode =
  | XPathDoctype
  | XPathElement
  | XPathText
  | XPathComment
  | XPathProcessingInstruction;

I could probably be convinced that the single function in the XPathValueAdapter interface could instead go in XPathTraversalAdapter. I could also be convinced the adapter interface would be better as a flat bag of functions, i.e. ...

Flattened ```ts interface XPathDOMAdapter extends XPathNodeKindAdapter, XPathNameAdapter, XPathTraversalAdapter, XPathValueAdapter {} ```

Future, optional XPathOptimizationsAdapter

In my prototyping, there were some DOM access patterns for which a given adapter may have optimization opportunities specific to its underlying implementation. In some cases, those optimizations were specific to the WHATWG DOM (and it'd be a shame to deoptimize that adapter).

There are of course tons of opportunities to provide optimizations specific to @getodk/xforms-engine. Some of these adapter-specific optimizations may be brought into scope on a per case basis, as they'll be in focus as we refactor. One likely case will be called out below as I discuss the engine adapter.

Prerequisite: a WHATWG DOM adapater

We have a wealth of @getodk/xpath tests which will help guide this effort. To benefit from these tests, it will be necessary to build a WHATWG DOM adapter. And as mentioned earlier, current WHATWG DOM implementation details effectively guided the adapter design. Building this adapter essentially comes "for free"—it is effectively the same work as refactoring to support adapters in the first place.

And while we may phase out our use of the WHATWG DOM for XPath purposes, it's still the most likely way other users would want to use the @getodk/xpath package.

The WHATWG DOM adapter would probably be the default adapter, at least initially and for the immediate future. Any hypothetical users other than us would likely experience this change as totally transparent. It is only a breaking change for our own usage.

A few other @getodk/xpath implementation odds and ends

There are a few areas in @getodk/xpath that don't fit the adapter design well, which we'd want to refactor as a prerequisite step. These are all very narrow in scope. They mostly involve:

  1. Eliminating any remaining usage of the TreeWalker DOM API. (This is ready to be cherry picked from my prototype branch. The only remaining impact would be a little more review overhead.)

  2. Relaxing or eliminating certain assumptions about XForm-specific document structure, which never really felt like the best assumptions in the first place. Specifically: instead of instance() and jr:itext() operating on same-document subtrees outside of the primary instance, the XFormsXPathEvaluator entrypoint will need to accept explicit access to these subtrees—more or less as if they were analogous to DocumentFragments. (This isn't quite cherry pick-ready, but it's also fully working in my prototype.)

Engine usage: an XForms adapter

Bringing it all together, for the engine to use this adapter design, we'll need to build an adapter to the engine's notion of a node tree. This work is trivial, relative to the underlying refactors to support an adapter in the first place.

We will, however, need to introduce and utilize a few new node types in the engine.

PrimaryInstance (as XPathDocument)

While we already have a single root node (Root/RootNode), that node is necessarily an element in XPath semantics (XPathElement). XPath AbsoluteLocationPath expressions expect that root element to have a parent document.

We can make this node's introduction as transparent to clients as we like:

Further, we can consider this node a potential future mechanism to convey some error conditions which we consider partial/recoverable. We can also expand on that concept to convey details which are definitely not error conditions but have similar informational benefits for some/all clients (form design warnings, informational notices of any sort).

Representation of secondary instances, itext translations (naming TBD, let's bikeshed!)

We will need a representation (or multiple representations?) for:

⭐ = this is why we're here now!

I have already been able to rapidly prototype the first two of these, enough to have high confidence in the adapter design/approach, and to have confidence that it will be a great fit to the external secondary instance use case.

In my prototyping, I was also able to confirm that this approach unlocks optimizations directly focused on these subtrees—and can do so without introducing undue complexity or undue coupling within the @getodk/xpath package and its implementation details.

Secondary instances in particular are already, by far, our greatest performance liability, and our most obvious optimization target. The external secondary instances feature is, in my opinion, a great opportunity to set ourselves up for making these optimizations approachable and sensible.

"Free" engine complexity reduction, performance wins

My prototyping of this adapter approach has also helped to confirm certain other benefits I've been anticipating.

Most notably, I can confirm that an XPath adapter implemented to directly operate on our @getodk/xforms-engine node implementations eliminates most if not all of our need for complex dependency analysis and resolution.

Computations update as expected without needing to traverse an instance for SubscribableDependency references, or a need to subscribe() to them. I haven't tested this yet, but I have reasonably high confidence that issues like #178 will either be solved directly by adopting the adapter approach, or be a one-time solution applicable to all node types (current or future).

This will not only reduce incidental complexity, it'll also improve performance. Currently, we deliberately over-subscribe to unrelated dependencies—performing many unnecessary and, on slower forms, costly recomputations—as a sacrifice to confidence in correctness. Eliminating this aspect of the engine's internal reactive implementation will be a performance improvement for many large/complex forms, especially over the time a user spends on a form.

Options 1b/1c: other approaches to decoupling from the WHATWG DOM

I want to mention some other approaches to XPath evaluation of non-WHATWG DOM that I've either prototyped or considered. These are certainly viable options in the abstract, and may seem more obvious in the abstract. My goal in this section is to briefly capture them and explain why I felt the specific adapter approach described above is more suitable for our needs.

The first alternative is compelling because it minimizes changes to @getodk/xpath, both its API and internals.

The second alternative is compelling because it allows us to define an ideal API for abstract DOM interaction and say "this is what it means to be an XPath node of this kind, with all of its XPath semantics built into the node itself".

Both have a downside of imposing a fairly large surface area onto the implementation of nodes themselves. That surface area makes it hard to reason about where satisfying XPath semantics ends and where the nodes other concerns begin. It also has a very high likelihood of conflicting with existing node APIs with different meaning/implications (both in the WHATWG DOM and on @getodk/xforms-engine nodes).

The first has these additional downsides:

Option 1 as proposed above basically basically applies inversion of control to the second option: we define the APIs we want specifically to suit XPath semantics, and then isolate those semantics to functions operating on arbitrary nodes which themselves don't need to know about those XPath semantics (unless it would also be beneficial for them to do so).

[^2]: A variant of this which I explored and ruled out is a Proxy-based approach to wrapping WHATWG DOM nodes. This was infeasible because Proxy and WHATWG DOM implementations just have fundamental frictions, and in some cases just don't work for our needs at all.

eyelidlessness commented 2 months ago

One thing worth considering as a followup of Option 1: it might be nice to move all of the XForms-specific XPath behavior from @getodk/xpath into @getodk/xforms-engine. That is way more churn than I'd want to deal with for external secondary instances, and I'd want to put more design thought into it. But I think it makes a lot of sense to reconsider that boundary afterward.

My main motivation for that is we can make far more XForms-specific assumptions in the xf/jr function implementations if we can also make assumptions about the adapter implementation. This occurred to me while updating the README in #204, which got me wondering about jr:choice-name and how we should support it. It just really makes sense to be implemented in the engine, with knowledge about the engine's nodes and their state. There are other things that make more sense that way too. Since jr:choice-name mentions active language, that's state that makes much more sense if it comes from the engine rather than being the awkward dance between the engine and xpath that it is now. There are aspects of #195 that we couldn't reasonably do without more engine knowledge (i.e. reject node-set arguments that don't reference actual repeats).

I'm sure I could go on and on if I spend more time thinking about doors this change would open.

eyelidlessness commented 2 months ago

Capturing discussions from yesterday: we've decided now is a good time to move forward with Option 1! 🎉

There was also some discussion of:

  1. Potentially breaking up some of the work. One of the prerequisites for Option 1—separate trees for secondary instances, itext, primary instance—also implies the possibility of mixing adapters, where e.g. the primary instance may be evaluated against the engine's node representation(s), while secondary instances and itext could (at least temporarily) continue to be evaluated with the WHATWG DOM adapter introduced in Option 1. I think this is at least a compelling approach to taming scope, of a change which will already be a pretty large refactor.

  2. Whether we may benefit from leveraging browsers' native XPath evaluator for those secondary instance/itext cases. This was raised yesterday by @lognaturel, citing significant performance improvement in Enketo (enketo/enketo#1342).

On (2), I am skeptical that we'd benefit as much from this as Enketo does. Specifically, I'm skeptical that we'd benefit as much as we would from the kinds of optimizations we may achieve with a static node representation (or possibly even multiple representations) of our own design.

However! Since this question also gets to our current most significant performance issue (itemsets/choice filters), it is definitely worth thinking about and maybe a spike. To make it work, we'd also need a solution for substituting values of primary instance nodes where they're referenced in instance() expressions.

My main hesitation on (1) is rooted in that same skepticism. We tend to be conservative about taking on refactors. This is a good thing for project stability and progress! However, I worry that by splitting this one, which has clear buy-in and obvious potential far reaching for long term benefits, we may defer realizing many of those benefits indefinitely.