amazon-ion / ion-schema

The Ion Schema Specification. This specification is licensed under the Apache 2.0 License.
https://amazon-ion.github.io/ion-schema/
Apache License 2.0
13 stars 10 forks source link

Allow field names to be constrained without exact field names being specified #44

Closed popematt closed 1 year ago

popematt commented 3 years ago

Problem

Right now, the only way to prescribe certain field names is to specify them in the fields constraint. However, there is no way to generally constrain field names (for example, require that all field names must be lower case) without listing each possible field name.

Possible Solutions

Here are some possible solutions. (These are just suggestions, and not intended to limit the possible solutions.)

Solution 1 – field_names constraint for validating the field names in structs

The field_names constraint would be roughly analogous to the element constraint, but instead of constraining the field values of a struct, it would constrain the field names of a struct. Field names are not true Ion values, but they are similar to symbols, and could be validated as such. The implicit schema for field names is:

type::{
  name: symbol_token,
  type: symbol,
  annotations: closed::[]
}

The field_names constraint could accept a TYPE_REFERENCE, and all field names would be validated against that type, as if the field name was an Ion Symbol value (rather than just being a symbol token). For example:

type::{
  name: constrained_field_names,
  type: struct,
  field_names: { 
    regex: "[a-z]+",
    codepoint_length: range::[1, 8],
  }
}

valid::[ 
  { foo: 1 }
]

invalid::[
  { FOO: 1 },  // Uppercase not allowed
  { _foo: 1 },  // '_' not allowed
  { foobarbaz: 1 }, // Field name is too long
  { '': 1 }, // Field name is too short
]

The field_names constraint could also allow a distinct:: annotation, which would indicate that there should be no duplicate field names in the struct. This proposed solution would also be a way to solve https://github.com/amzn/ion-schema/issues/14. For example:

type::{
  name: no_duplicate_field_names,
  type: struct,
  field_names: distinct::any  // Doesn't matter what the field names are, as long as they are not duplicated
}

Example grammar:

FIELD_NAMES ::= distinct::<TYPE_REF>
              |           <TYPE_REF>

Solution 2 – field_names regex-based constraint validating the field names in structs

This is like the regex constraint, except that it validates the field names of a struct, and it has an additional distinct:: annotation, which would indicate that there should be no duplicate field names in the struct (to solve https://github.com/amzn/ion-schema/issues/14).

Here is an example:

type::{
  name: constrained_field_names,
  type: struct,
  field_names: distinct::"^[a-z]{1,8}$",
}

valid::[ 
  { foo: 1, bar: 2 }
]

invalid::[
  { foo: 1, foo: 2 }, // Field names are not distinct
  { FOO: 1 },  // Uppercase not allowed
  { _foo: 1 },  // '_' not allowed
  { foobarbaz: 1 }, // Field name is too long
  { '': 1 }, // Field name is too short
]

While this is simpler to implement than Solution 1, it has the disadvantage of not allowing code re-use (e.g. field_names: distinct::uuid_string vs. having to copy/paste the regex for a UUID string), and it doesn't provide as much functionality as Solution 1 because there is no equivalent to field_names: { utf8_byte_length: range::[1, 16] }.

Solution 3 – Update fields constraint to operate on a bag of tuples

Update fields to allow a list of rules defining what is a valid name/value pair in a struct. Eg.

type::{
  name: foo,
  type: struct,
  fields: [ 
    {
      field_name: { regex: "[a-z]+" },
      value_type: int,
    },
  ]
}

An interesting capability is that it allows us to specify that certain types of fields should have certain types of field names, and vice versa For example, this is how one could constraint fields with a name starting with f_ to be floats and i_ to be ints. (And in this example, it works both ways, all ints must have a field name starting with i_.)

type::{
  name: foo,
  type: struct,
  fields: [
    // "i_.+" --> int
    { field_name: { regex: "i_.+" }, value_type: int },
    // "f_.+" --> float
    { field_name: { regex: "f_.+" }, value_type: float },
    { field_name: any, value_type: { not: int } },
  ]
}
popematt commented 2 years ago

Proposal

The field_names constraint defines the type and/or constraints for all field names within a struct. Field names are symbols and will be represented as symbol values for the purpose of validation.

Syntax

<FIELD_NAMES> ::= field_names: distinct::<TYPE_REFERENCE>
                | field_names: <TYPE_REFERENCE>

Evaluation

Sample Implementation

This assumes that the distinct:: modifier has been implemented as in #43. This is an example of one way of implementing the constraint to serve as a reference for the correct behavior. There is no requirement that it must be implemented this way.

private fun test(value: IonValue, typeRef: Type): Boolean {
    if (value !is IonStruct || value.isNullValue) return false

    val nameList = value.mapTo(ionSystem.newEmptyList()) { ionSystem.newSymbol(it.fieldName) }

    return ElementConstraint.test(nameList, typeRef)
}
popematt commented 1 year ago

Resolved by #76