AMWA-TV / is-04

AMWA IS-04 NMOS Discovery and Registration Specification (Stable)
https://specs.amwa.tv/is-04
Apache License 2.0
39 stars 23 forks source link

Query language enhancements targeting minimal clients #132

Open andrewbonney opened 4 years ago

andrewbonney commented 4 years ago

I'm noting a few items here which I've previously discussed with @garethsb-sony just so they're noted down somewhere more public. These are just ideas at present and have not yet been tested out.

Paging limit for WebSockets At present pagination is not allowed for Query API WebSocket subscriptions. By permitting the paging.limit query parameter, minimal clients could restrict the maximum message size a Query API could send. Pairing this with the existing max_update_rate_ms parameter would provide greater guarantees over message rates.

RQL queries for related records When writing a client such as a connection manager you can easily filter a single resource type such as Senders. Finding the Flows which relate to these filtered Senders is much harder and typically requires the Flows to be addressed individually, or the entire Flow collection to be consumed. By adding a rel (related) RQL query parameter this could be assisted as follows:

/flows?query.rql=rel(senders,matches(transport,urn%3Ax-nmos%3Atransport%3Artp))

The query string above would return only the Flows where the related Senders match a particular query. This could be used via the REST API and via WebSocket subscriptions.

Restarting WebSocket subscriptions after a disconnection If a WebSocket connection is interrupted, the client must create a new WebSocket subscription and consume the initial 'sync' message containing all data matching its query. Whilst this works fine, an optimisation would be to make use of the Query API's paging cursors at subscription creation time. By doing this the Query API could be informed of the most recent change the client is aware of and only pass on changes from that point, avoiding the larger initial sync message.

garethsb commented 4 years ago

Paging limit for WebSocket using paging.limit is implemented as an extension in the OSS nmos-cpp-registry and has been used successfully to enable Query WebSocket clients whose WebSocket engines don't support configuration of the maximum received message size.

garethsb commented 4 years ago

RQL queries for related records using rel (which is minimally described in some of the RQL specs) seems like a good idea. Using the related resource type name as the relation reads well in most cases such as the example given, and could be nested e.g. to get the Sources associated with Senders and Flows matching some criteria), but I wonder if we need to be explicit about the resource type and property that encode the relation?

E.g. in the example given it is the Senders' flow_id that encodes the relation. Can we make rel queries using the same relation from the other direction? How should Flow parents be requested? Can we request Flow children? How about general Flow ancestor/descendant queries?

garethsb commented 4 years ago

Restarting WebSocket subscriptions after a disconnection using paging.since has been demonstrated to be highly beneficial especially when restarting connections on huge Registries (10,000+ resources). After a brief disconnection, zero or one message may logically be enough to rejoin, whereas the current spec requires ~10 MB of data to be transmitted.

However, implementing this relies on the Client knowing the appropriate value, i.e. that each Query WebSocket message should have a value equivalent to an X-Paging-Until response header. This has proven a little difficult to specify, especially in interaction with the paging.limit extension also described above.

garethsb commented 4 years ago

RQL queries for related records

Copying in the thoughts we have had on rel syntax and semantics...

My goal is to not define the supported relations for each resource type out-of-band, but to define relations using the existing JSON property definitions.

Syntax

Basic syntax: rel(<relation>, <call-operator>) Result: the same result type as the <call-operator>, i.e. usually bool.

  1. Forward references
    i.e. where <relation> is a <property> of the queried resource type that is equal to an id of a related resource type

    Example:

    senders
     ?query.rql=
       and(
         or(
           eq(transport,urn%3Ax-nmos%3Atransport%3Artp),
           eq(transport,urn%3Ax-nmos%3Atransport%3Artp.mcast)
         ),
         rel(flow_id,
           eq(format,urn%3Ax-nmos%3Aformat%3Avideo)
         )
       )
  2. Backward references
    i.e. where <relation> is from a property of another resource type that is equal to an id of the resource type being queried

    Example:

    flows?query.rql=rel(senders%3Fflow_id,eq(transport,urn%3Ax-nmos%3Atransport%3Artp))
    • initially I proposed to represent the <relation> by the string <resource>?<property> but ? must unfortunately be percent-encoded because it isn't directly allowed by the nchar production used in RQL
    • . would be confused with nested property syntax; : isn't allowed because it's used to distinguish the typed-value production; all of /, $ and @ also require percent-encoding; in fact the only punctuation chars besides . that are allowed unencoded are *+-_~ which all seem awkward
    • another alternative would be to use the RQL array production here, i.e. (<resource>, <property>), or simply a three-argument call-operator 'overload', i.e. rel(<relation-resource>, <relation-property>, <call-operator>)?

This so far only accounts for references via id properties. Maybe we want to support references via other identifier properties such as between Source clock_name and Node clocks.name. Another example where this could be very useful is in the relation between Sender or Receiver interface_bindings and Node interfaces.name.

However, this is not as simple as it seems since those identifiers are only unique within the same Node, and select a sub-object of the Node resource. An expression using clock_name deep within rel(device_id,rel(node_id,<call-operator>) might be possible, but would require the <call-operator> to be able to accomplish comparison of the clock_name from the outer 'scope' with the clock.name found in the inner 'scope'.

Therefore this is currently not supported in this proposal, and could be considered as a reason for defining the supported relations independently of the existing JSON property definitions instead.

Semantics

One way of describing how relations behave is by transforming them to sub-queries.

In general, the rel call-operator in a query like {resourceType}?query.rql=rel(<relation>,<call-operator>) may be transformed into a new query {relatedType}?query.rql=and(<related-property-call-operator>, <call-operator>), where the {relatedType} and the <related-property-call-operator> are determined from the <relation>.

Examples:

  1. Backward references

    In the query:

    flows?query.rql=rel(senders%3Fflow_id,eq(transport,urn%3Ax-nmos%3Atransport%3Artp))

    for each flows/{flowId}, the rel call-operator is effectively equivalent to evaluating a sub-query:

    senders?query.rql=and(eq(flow_id,{flowId}),eq(transport,urn%3Ax-nmos%3Atransport%3Artp))

    The result of that query is naturally an array which may contain zero or more senders. The result of the rel call-operator is false if that array is empty, true otherwise.

  2. Forward references

    Similarly in the query:

    senders?query.rql=rel(flow_id,eq(format,urn%3Ax-nmos%3Aformat%3Avideo))

    for each senders/{senderId}, with "flow_id": "{flowId}", the rel call-operator is effectively equivalent to evaluating a sub-query:

    flows?query.rql=and(eq(id,{flowId}),eq(format,urn%3Ax-nmos%3Aformat%3Avideo))

    The result of that query is naturally an array which may contain either exactly one flow or no flows. The result of the rel call-operator is false if the array is empty, true otherwise.

    For array-valued forwards references, like Device senders or receivers (both deprecated), and Source or Flow parents, a sub-query would effectively be equivalent to the in operator, so queries like this:

    • devices?query.rql=rel(senders,<call-operator>)
    • sources?query.rql=rel(parents,<call-operator>)

    would involve constructing sub-queries like so:

    • senders?query.rql=and(in(id,({...senders})),<call-operator>)
    • sources?query.rql=and(in(id,({...parents})),<call-operator>)