amazon-ion / ion-schema

The Ion Schema Specification. This specification is licensed under the Apache 2.0 License.
https://amazon-ion.github.io/ion-schema/
Apache License 2.0
13 stars 10 forks source link

Improve modeling of annotations in Ion Schema #51

Closed popematt closed 1 year ago

popematt commented 2 years ago

Problem

The annotations constraint has some confusing interactions between the modifiers (ordered::, closed::, required::, and optional::); and without any modifiers, the annotations constraint actually does nothing, which is a really confusing default behavior. The annotations constraint is also unable to model things like:

Possible Solution

We could model annotations (approximately) as a list of symbol tokens, and allows the annotations constraint to take a <TYPE_REF> as its argument.

For example:

type::{
  name: annotated_type_1,
  // max 10 annotations
  // annotations must start with a lower case letter and consist only of lowercase letters and numbers
  // no duplicates annotations allowed
  annotations: {
    container_length: range::[min, 10],
    element: distinct::{ // <---------- As proposed in ion-schema#43 
      regex: "[a-z][a-z0-9]*" 
    } 
  }
}

type::{
  name: annotated_type_2,
  // Has 1 of M media types followed by 1 of N encodings
  annotations: {
    ordered_elements: [
      { valid_values: [ image_jpeg, image_gif, image_png, text_plain, application_json ] },
      { valid_values: [ gzip, deflate, zstd ] },
    ]
  }
}

type::{
  name: annotated_type_3,
  // Must have exactly 1 annotation that looks like a fully-qualified java class name
  annotations: {
    container_length: 1,
    element: { regex: "([\p{L}_$][\p{L}\p{N}_$]*\.)*[\p{L}_$][\p{L}\p{N}_$]*" }
  }
}

This can be done in a backwards compatible way that preserves the existing syntax because all <TYPE_REF> are either a symbol or a struct, which can be clearly distinguished from the existing syntax that is a list of symbols.

jpschorr commented 2 years ago

In the course of modeling PartiQL test cases in ion text, one proposal for a way to model the test cases was something like:

'literal'::[
  'int'::parse:: "5",
  'null'::parse:: "null",
  // ...
  'nested empty struct'::parse:: "{'a':{}}"
]

In attempting to model this in Ion Schema, it doesn't currently seem possible to:

popematt commented 2 years ago

Proposal

The annotations constraint will be expanded to define the type and/or constraints for all annotations of a value. Annotations are symbols tokens but will be represented as a non-null list of non-null, un-annotated symbol values for the purpose of validation.

The existing annotations syntax shall remain, since it is the most concise and readable syntax for use cases such as required::closed::[one_required_annotation].

Syntax

<ANNOTATION> ::= <SYMBOL>
               | required::<SYMBOL>
               | optional::<SYMBOL>

<ANNOTATIONS_MODIFIER> ::= required::
                         | ordered::
                         | closed::

<ANNOTATIONS> ::= annotations: <ANNOTATIONS_MODIFIER>... [ <ANNOTATION>... ]
                | annotations: <TYPE_REFERENCE>

Evaluation

Sample Implementation

This is an example of one way of implementing the constraint to serve as a reference for the correct behavior. There is no requirement that it must be implemented this way. This example covers only the new logic. Actual implementations should also include the existing annotations logic.

private fun test(value: IonValue, typeRef: Type): Boolean {
    val annotationsList = value.typeAnnotations.mapTo(ionSystem.newEmptyList()) { ionSystem.newSymbol(it) }
    return typeRef.isValid(annotationsList)
}

Alternatives Considered

It would be possible to completely eliminate the existing annotations syntax since this is going into a new major version of Ion Schema and therefore is not required to be backwards compatible. However, the reasons for keeping the old syntax are twofold

  1. A common use case is to require or allow only one specific annotation. The new syntax may be (subjectively) less readable than the old syntax. For example:
Old Syntax New Syntax
annotations: closed::required::['com.fooservice.Request'] annotations: { valid_values: [['com.fooservice.Request']] } or annotations: { contains: ['com.fooservice.Request'], container_length: 1 } or annotations: { ordered_elements: { valid_values: ['com.fooservice.Request'] } }
annotations: required::['com.fooservice.Request'] annotations: { contains: ['com.fooservice.Request'] }
annotations: closed::['com.fooservice.Request'] annotations: { valid_values: [[], ['com.fooservice.Request']] } or annotations: { ordered_elements: { occurs: optional, valid_values: ['com.fooservice.Request'] } }or annotations: { element: { valid_values: ['com.fooservice.Request'] }, container_length: range::[0, 1] }
  1. Backwards compatibility will make it easier for users to upgrade from Ion Schema 1.0 to 2.0, though this could be mitigated with a schema upgrade tool.
popematt commented 2 years ago

The behavior of some of the ISL 1.0 syntax is unexpected and/or not useful, so perhaps only part of the old syntax should be retained. By removing ordered::, by requiring at least one of closed:: or required::, and by eliminating the annotation-level optional:: modifier, we can eliminate all of the potentially confusing or useless cases mentioned in https://github.com/amzn/ion-schema-kotlin/issues/150, while also preserving the succinct syntax for the simplest use cases.

popematt commented 1 year ago

Resolved by #76 in Ion Schema 2.0