amazon-ion / ion-schema

The Ion Schema Specification. This specification is licensed under the Apache 2.0 License.
https://amazon-ion.github.io/ion-schema/
Apache License 2.0
13 stars 10 forks source link

Add support for "inheriting" fields from a parent type #113

Open popematt opened 1 year ago

popematt commented 1 year ago

With the new content closed, is it possible to restrict the fields to the union of the derived type and it's inherited ones? ... we don't want to have to copy/paste the fields of foo into bar to close on them, we still want an inheritance.

Originally posted by @didibus in https://github.com/amazon-ion/ion-schema/issues/47#issuecomment-1526090678

Ion Schema does not have any OOP-style inheritance because constraints can only subtract from the set of values that are allowed for a given type definition. That being said, it is inconvenient to have to copy/paste fields from one type definition to another, and it is a maintainability issue to have to manually keep the fields of multiple type definitions in sync.

Any solution to this problem should consider whether the solution would make it easier or harder to generate code (e.g. POJOs or Ion 1.1 macros) for a type definition.

An Idea

Here is a strawman proposal of something that could work without needing to introduce any OOP-style inheritance into Ion Schema.

type::{
  name: foo,
  fields: {
    a: int,
    b: string,
  }
}

type::{
  name: bar,
  fields: closed::union::[

    // Adds the contents of the fields constraint from type foo
    from_type::foo,

    // Adds the contents of the fields constraint from an inline-imported type
    from_type::{ id: "other_schema.isl", type: baz },

    // Adds some fields defined locally
    {
      c: int,
      d: string,
    }
  ]
}

Some open questions with this approach:

An alternate approach

Instead of using fields to create unions of field definitions, we could create a special syntax for the field_names constraint since you can close the fields using field_names. E.g.:

type::{
  name: foo,
  fields: {
    a: int,
    b: string,
  }
}

type::{
  name: bar,

  // This causes all the constraints in type `foo` to be in force for this type (bar)
  type: foo,  

  // Add some constraints on fields specific to `bar`
  fields: {
    c: int,
    d: string,
  },

  // Restrict the valid field names to those defined in `foo` and `bar` fields constraints
  field_names: fields_of::[ foo, bar ],
}

In this case, the fields_of annotation is a flag that indicates that the Ion Schema System should automatically generate an inline type based on the fields of the listed types. I.e. { valid_values: [ a, b, c, d ] }.

This seems like an elegant solution to me, especially since it avoids the question of how to combine the occurs or type definitions of each field.

However, this does seem like it might violate the tenet of having orthogonal constraints. Tenets are made to serve us, not us serve the tenets, but it is a good indicator that we need to be careful. There's also the unanswered question of how an anonymous/inline type could refer to its own fields constraint(s).

desaikd commented 1 year ago

I like the 1st approach. I wanted to share my opinion on some of the questions you raised:

Also, for the second approach you mentioned we won't be able to get the type definition but just the name of the fields.

popematt commented 1 year ago

Also, for the second approach you mentioned we won't be able to get the type definition but just the name of the fields.

It does get the type definitions for the fields, but I see now that I didn't explain that part in my original post. I've added some more comments to my original post that help explain how it works.

almann commented 1 year ago

FWIW, I found approach 1 to be very hard for me to reason about, but approach 2 reminds me of prototype-based programming to me, which I can sort of see how it works.

My questions are mostly around do we see this as only operating on fields constraints? What actually gets pulled in here from the type that we're "prototyping" against? Is this meaningful in other contexts where non-field constraints are useful? I.e., is this a concept that could/should be generalized?

popematt commented 1 year ago

My questions are mostly around do we see this as only operating on fields constraints?

That was all I was thinking, but I am open to suggestions.

What actually gets pulled in here from the type that we're "prototyping" against?

Let's use this example again:

type::{
  name: foo,
  fields: { a: int, b: string, }
}

type::{
  name: bar,
  type: foo,  
  fields: { c: int, d: string, },
  field_names: fields_of::[ foo, bar ],
}
  1. The fields_of syntax has a list of types – [ foo, bar ]
  2. For each type, get the fields constraint(s) – [ { a: int, b: string, }, { c: int, d: string, } ]
  3. For each fields constraint, get the field names defined in that struct. – [ [a, b], [c, d] ]
  4. Flatten & deduplicate the list — [ a, b, c, d ]
  5. Use it to create a list of valid field names — equivalent to field_names: { valid_values: [a, b, c, d] }

Does that explain it a little better? We're getting the field names that are defined in the fields constraint(s).

Is this meaningful in other contexts where non-field constraints are useful? I.e., is this a concept that could/should be generalized?

It might be possible to generalize this to some of the other constraints. E.g. with valid_values it would make sense to allow you to specify the set of valid values for type foo plus some arbitrary additional values. Similarly, it could work for contains. I could see something like this also being possible for all of the constraints specific to scalar types.

type::{
  name: some_values,
  valid_values: [ a, b, c ]
}

type::{
  name: more_values,
  valid_values: [ d, e, f ]
}

type::{
  name: combined_values,
  // allows a, b, c, d, e, f
  valid_values: values_from::[ some_values, more_values  ]
}

However, I don't think it's particularly useful for those constraints. It's easy enough with e.g. valid_values to do something like this:

type::{
  name: some_values,
  valid_values: [ a, b, c ]
}

type::{
  name: more_values,
  valid_values: [ d, e, f ]
}

type::{
  name: combined_values,
  any_of: [
    some_values,
    more_values,
  ]
}

I think that a similar concept could be useful for ordered_elements, but because lists/sexps/documents are "keyed" by index, it isn't a problem in quite the same way. You can't have an element number 3 unless you already have elements 0, 1, and 2; and you can guarantee that if there is no element 4, then there is no element with an index greater than 4. Which means, you can easily do something like this:


$ion_schema_2_0

// This type is functionally like a "begins with" constraint on a sequence
type::{
  name: some_ordered_elements,
  ordered_elements: [
    string,
    int, 
    { occurs: range::[0, max] }
  ]
}

// Allows the beginning of the sequence specified in `some_ordered_elements` followed by a single timestamp.
type::{
  name: more_ordered_elements,
  type: some_ordered_elements
  ordered_elements: [
    { occurs: 2, type: $any },
    timestamp.
  ]
}
``
almann commented 1 year ago
  • The fields_of syntax has a list of types – [ foo, bar ]
  • For each type, get the fields constraint(s) – [ { a: int, b: string, }, { c: int, d: string, } ]
  • For each fields constraint, get the field names defined in that struct. – [ [a, b], [c, d] ]
  • Flatten & deduplicate the list — [ a, b, c, d ]
  • Use it to create a list of valid field names — equivalent to field_names: { valid_values: [a, b, c, d] }

Thanks for the clarification, so it only creates a field_names constraint, what about the value constraints? What if there is a shared field z, such that the type constraints are incompatible like [{z: int}, {z:string}]? Is this ignored because we are just prototyping the field_names constraint?

popematt commented 1 year ago

Thanks for the clarification, so it only creates a field_names constraint, what about the value constraints?

Value constraints would need to be part of the fields constraint, or "inherited" using type, all_of, etc. This is trivially done. E.g.:

type::{
  name: foo,
  fields: { a: int, b: string, }
}

type::{
  name: bar,
  // "Inherits" all of the field value constraints from `foo`.
  type: foo, 
  // Adds some new field value constraints.
  fields: { c: int, d: string, },
  // Causes the type to be closed to any fields not specified in `foo` or `bar`.
  field_names: fields_of::[ foo, bar ],
}

type::{
  name: derived_from_closed_struct_type,
  type: bar,
  // This is useless—field names `e` and `f` are not permitted by type `bar`
  fields: { e: int, f: int },
}

Now, if foo had fields: closed::{ a: int, b: string, }, then you could not "add" fields in bar.

Another way to think of this is that closed:: is a shortcut for field_names: fields_of::[ <THIS TYPE> ].

What if there is a shared field z, such that the type constraints are incompatible like [{z: int}, {z:string}]? Is this ignored because we are just prototyping the field_names constraint?

That's correct, and the schema author can choose to handle it however they like using the existing logical constraints. E.g.:

type::{
  name: some_ints,
  fields: { z: int, x: int }
}
type::{
  name: some_strings,
  fields: { z: string, y: string }
}

type::{
  name: no_z,
  // Because of the conflicting field value definitions, z may not be present.
  all_of: [some_ints, some_strings],
  field_names: fields_of::[some_ints, some_strings],
}

type::{
  name: any_z,
  // Z may be an int or string
  any_of: [some_ints, some_strings],
  field_names: fields_of::[some_ints, some_strings],
}
almann commented 1 year ago

Okay, I think this is fine then, the "prototyping" is explicit for one kind of constraint and only that kind. We are free to add different ones in the future.

didibus commented 1 year ago

The confusing part is that the type is inherited, but what does that inheritance do?

Right now it inherits the field's constraints, but not the other map constraints, which is strange to me.

Also, it seems weird to close the type from extension. When I say it's closed, I'm thinking the map is closed. So if a subtype inherits the fields, and adds more, I'd expect it to be closed to the union of fields.

So here:

type::{
  name: derived_from_closed_struct_type,
  type: bar,
  // This is useless—field names `e` and `f` are not permitted by type `bar`
  fields: { e: int, f: int },
}

I would think that e and f are allowed along with the other inherited types, but that the type is closed from having more fields.

Our personal use-case, to put more context, is that we want all fields on our API input to be closed to catch typos as we insert things in our DB. All our types inherits from a few base types that has common fields like createdDate and all that.

Currently this is impossible, if you close the base type, all the derived type that define additional fields error. If you close the derived type, it errors because the base type adds fields as well.

What was surprising to us was that the fields constraints inherits and merge, but the content closed didn't inherit.

The fields_of makes sense to me, but what doesn't is the current inheritance, because I see it similarly to a "fields_of". When we say foo is of type bar, it means it has the fields of bar no? Or what's the distinction?

popematt commented 1 year ago

What was surprising to us was that the fields constraints inherits and merge, but the content closed didn't inherit. ... When we say foo is of type bar, it means it has the fields of bar no? Or what's the distinction?

Ion Schema has no inheritance or abstract types—only composition—and the thing we're composing is rules, rather than members of a class. When we say that foo is of type bar that means that a value must be bar (i.e. meet all of the constraints of bar) in order to be a foo.

r0b0ji commented 9 months ago

We have a usecase where we want to federate the schema creation. As a system owner, we define a base type and our multiple clients can define their custom type extending with additional fields and constraints. And the data for each clients should satisfy the constraints of their custom type as well as base type. When a new client creates a new custom type we shouldn't have to update our base type.

From my perspective, I find the alternate approach more intuitive.

popematt commented 9 months ago

Here's an example of a product listing for tires to make the federation use case more concrete.

tire_listing::{
  product_name: "performance tire",
  price: USD::188.50,
  vehicle_type: P,
  width: 225,
  aspect_ratio: 70,
  construction: R,
  rim_diameter: 16,
  load_index: 91,
  speed_rating: S,
  min_inflation_psi: 30,
  max_inflation_psi: 38,
  tread_depth_mm: 12,
}

In this case, you might have a base type (product_listing) and a derived type (tire_listing).

type::{
  name: product_listing,
  fields: {
    product_name: string,
    price: currency_amount,
    // and others...
  }
}
type::{
  name: tire_listing,
  type: product_listing,
  fields: {
    // Unfortunately, we have to repeat the fields from the base type if we want to close this struct.
    product_name: any,
    price: any,
    // New fields
    vehicle_type: symbol,
    width: int,
    aspect_ratio: int,
    construction: symbol,
    rim_diameter: int,
    load_index: int,
    speed_rating: symbol,
    min_inflation_psi: int,
    max_inflation_psi: int,
    tread_depth_mm: int,    
  }
}

It is probably cleaner to create a federated model using composition, such as this example, but in cases where the schema is being defined after a federated system is already running production workloads, you may not have the flexibility to model the data in an optimal way because of the cost of migrating existing data.

type::{
  name: federated_product_listing,
  fields: closed::{
    product_name: string,
    price: currency_amount,
    federated_details: list, // of "listing details"
  }
}
type::{
  name: tire_listing_details,
  annotations: closed::required::["tire_details"]
  fields: {
    vehicle_type: symbol,
    width: int,
    aspect_ratio: int,
    construction: symbol,
    rim_diameter: int,
    load_index: int,
    speed_rating: symbol,
    min_inflation_psi: int,
    max_inflation_psi: int,
    tread_depth_mm: int,    
  }
}
didibus commented 9 months ago

What was surprising to us was that the fields constraints inherits and merge, but the content closed didn't inherit. ... When we say foo is of type bar, it means it has the fields of bar no? Or what's the distinction?

Ion Schema has no inheritance or abstract types—only composition—and the thing we're composing is rules, rather than members of a class. When we say that foo is of type bar that means that a value must be bar (i.e. meet all of the constraints of bar) in order to be a foo.

I understand that now, so it's similar to all_of, it was confusing because it uses inheritance-like syntax.

We ended up implementing custom inheritance ourself. We simply copy rules from the parent into the child, and then the effective type becomes just the new type which has the rules of everything in the chain combined.

This doesn't fail if the child wants to close fields, because it doesn't need to be independently valid as all_of does, it simply need to remain valid when merging all rules together.

Maybe what I would propose is a union_of constraints? As opposed to all_of it first unions the constraints together, and then validates if the value is valid to their union. Where-as all_of validates one type after the other and expect the value to be valid to all of them.

type::{
  name: foo,
  type: struct,
  fields: {
    a: int,
    b: string,
  }
}

type::{
  name: bar,
  type: struct,
  fields: {
    c: int,
    d: string,
  },
  content: closed,
}

type::{
  name: foobar,
  union_of: [
    foo,
    bar
  ]
}

Where foobar here is the same as just copy/pasting their constraints together:

type::{
  name: foobar,
  fields: {
    a: int,
    b: string,
    c: int,
    d: string,
  }
  content: closed,
}

Now foobar is the union of both constraints. It's simply a more constrained type.

This isn't same as inheritance, because you don't need to handle merge conflicts, you could just duplicate. For example:

type::{
  name: foo,
  type: struct,
  fields: {
    a: int,
  }
}

type::{
  name: bar,
  type: struct,
  fields: {
    a: string
  },
  content: closed,
}

type::{
  name: foobar,
  union_of: [
    foo,
    bar
  ]
}

Would be same as:

type::{
  name: bar,
  type: struct,
  fields: {
    a: int,
    a: string,
  },
  content: closed,
}

This isn't a valid syntax I don't think, but it's for demonstration purpose of the behavior, so foobar just states that the constraints are unioned, so it has a key a that must be a valid int and a valid string, this won't ever validate obviously, but that's the difference with inheritance, so you don't need logic about ordering of precedence when merging the types.

Maybe that would satisfy our use-case of combining fields and letting the content: closed behavior close over the union of fields, but without having to introduce real inheritance?

popematt commented 9 months ago

@didibus A union_of constraint is certainly an interesting idea. I think it has potential as long as we can define what it means in a consistent way. I think that you've presented something that will work for merging fields constraints, and I can see how it might work for a constraint like valid_values, but I'm not sure how it would work with other constraints such as ordered_elements or codepoint_length.

If you're just suggesting it for fields then I think it's functionally similar to the proposal I made in this prior comment (even though the syntax may be different), and I think some variation of this approach is probably going to be the most feasible solution. If you have more ideas about how it can be generalized for all constraints, maybe you could go ahead and create a new issue for that idea?

didibus commented 9 months ago

@popematt You're right I didn't think about all constraints yet, so not sure if there is a nice way to tackle them all. I'll put some thoughts into it.

One thing maybe I missed from your proposal, can you enforce the closeness?

One thing that was important for us, is that if I close in any one of them, it should be closed, but its the union of the fields that should be closed.

popematt commented 9 months ago

In my proposal, the "closed-ness" across multiple type definitions is enforced by the field_names constraint. For example, a type with field_names: fields_of::[a, b] would be closed to only include the fields declared in type a and type b.

type::{
  name: a,
  fields: {
    a1: int,
    a2: string,
  }
}
type::{
  name: b,
  type: a,
  field_names: fields_of::[a, b],  // Values of type 'b' can only have the fields `a1`, `a2`, and `b1`.
  fields: {
    b1: int
  }
}

However, when using this approach, the fields constraints should not be closed:: because Ion Schema never removes or relaxes any constraints when creating a derived type. For example:

// Assume type `a` is the same as the last example

type::{
  name: b,
  type: a,
  field_names: fields_of::[a, b],  // Type `b` can only have fields `a1`, `a2`, and `b1`...
  fields: closed::{                // but `fields` is closed, so only `b1` is allowed.
    b1: int
  }
}