amazon-ion / ion-schema

The Ion Schema Specification. This specification is licensed under the Apache 2.0 License.
https://amazon-ion.github.io/ion-schema/
Apache License 2.0
13 stars 10 forks source link

Consider allowing ISL to make assertions about subsequences in a list/sexp/document #41

Open popematt opened 3 years ago

popematt commented 3 years ago

The Problem

ISL has very limited support for validating any sort of relationships between elements in a collection. To illustrate, here is an example. Right now it is impossible to define the grammar of Disjunctive Normal Form (DNF) using ISL.

Here is a grammar for DNF:

disjunction ::= conjunction_sexp v disjunction
              | conjunction_sexp

conjunction_sexp ::= ( conjunction )

conjunction ::= literal ʌ conjunction
              | literal

literal ::= ~ variable
          | variable

And here are some examples of valid DNF:

(A ʌ B)
(A ʌ ~B ʌ C) v (~A ʌ D ʌ E)

The reason ISL can't express a grammar like this is because it has no way to specify that:

This is not necessarily a strong use case for subsequence assertions (because we could use a prefix grammar instead of an infix grammar to express the same thing), but it does help to illustrate some of the things that ISL cannot do right now.

Possible Solutions

It is possible that we could decide this is out of scope for ISL. However, if it is something that would be good for ISL, these are are a few (unrefined) ideas that could be used as a starting point.

Allow some sort of lookahead/behind for elements and/or ordered_elements

For example:

type::{
  name: disjunction,
  type: document,
  element: {
    one_of: [
      {
        type: conjunction,
        // `follows: nothing` would mean that it is at the beginning of the list/sexp/document
        follows: { one_of: [ nothing, { valid_values: [ v ] } ] },
        precedes: { one_of: [ nothing, { valid_values: [ v ] } ] }
      },
      {
        valid_values: [ v ],
        follows: conjunction,
        precedes: conjunction
      }
    ]
  }
}

Define a new syntax for collection members

This might need to be a new constraint altogether because it is significantly different from any of the existing constraints. In this example, I've used members, though I think it to be a slightly awkward name.

type::{
  name: disjunction,
  type: document,
  members: ordered::[
    { type: conjunction },
    {
      occurs: range::[0, max],
      members: ordered::[
        { type: { valid_values: [ v ] } },
        { type: conjunction },
      ],
    },
  ]
}

The argument to the members constraint is not a list of type references, but a list of member definitions. A member definition has an occurs (defaults to required), and either a type (with a usual type reference) or a members field (with nested member definitions).

popematt commented 2 years ago

A less invasive change (and perhaps a more intuitive syntax) would be to just allow nesting of ordered elements constructs using a subgroup annotated struct, like this:

type::{
  name: disjunction,
  type: document,
  ordered_elements: [
    { type: conjunction },
    subgroup::{
      occurs: range::[0, max],
      ordered_elements: [
        { type: { valid_values: [ v ] } },
        { type: conjunction },
      ]
    }
  ]
}
popematt commented 10 months ago

Upon further thought, if accepted, this feature would need to be designed very carefully to ensure it can be implemented in a way that is not vulnerable to catastrophic backtracking the way some Regex implementations are.