Open popematt opened 1 year ago
I like the 1st approach. I wanted to share my opinion on some of the questions you raised:
Also, for the second approach you mentioned we won't be able to get the type definition but just the name of the fields.
Also, for the second approach you mentioned we won't be able to get the type definition but just the name of the fields.
It does get the type definitions for the fields, but I see now that I didn't explain that part in my original post. I've added some more comments to my original post that help explain how it works.
FWIW, I found approach 1 to be very hard for me to reason about, but approach 2 reminds me of prototype-based programming to me, which I can sort of see how it works.
My questions are mostly around do we see this as only operating on fields
constraints? What actually gets pulled in here from the type that we're "prototyping" against? Is this meaningful in other contexts where non-field constraints are useful? I.e., is this a concept that could/should be generalized?
My questions are mostly around do we see this as only operating on fields constraints?
That was all I was thinking, but I am open to suggestions.
What actually gets pulled in here from the type that we're "prototyping" against?
Let's use this example again:
type::{
name: foo,
fields: { a: int, b: string, }
}
type::{
name: bar,
type: foo,
fields: { c: int, d: string, },
field_names: fields_of::[ foo, bar ],
}
fields_of
syntax has a list of types – [ foo, bar ]
fields
constraint(s) – [ { a: int, b: string, }, { c: int, d: string, } ]
fields
constraint, get the field names defined in that struct. – [ [a, b], [c, d] ]
[ a, b, c, d ]
field_names: { valid_values: [a, b, c, d] }
Does that explain it a little better? We're getting the field names that are defined in the fields
constraint(s).
Is this meaningful in other contexts where non-field constraints are useful? I.e., is this a concept that could/should be generalized?
It might be possible to generalize this to some of the other constraints. E.g. with valid_values
it would make sense to allow you to specify the set of valid values for type foo
plus some arbitrary additional values. Similarly, it could work for contains
. I could see something like this also being possible for all of the constraints specific to scalar types.
type::{
name: some_values,
valid_values: [ a, b, c ]
}
type::{
name: more_values,
valid_values: [ d, e, f ]
}
type::{
name: combined_values,
// allows a, b, c, d, e, f
valid_values: values_from::[ some_values, more_values ]
}
However, I don't think it's particularly useful for those constraints. It's easy enough with e.g. valid_values
to do something like this:
type::{
name: some_values,
valid_values: [ a, b, c ]
}
type::{
name: more_values,
valid_values: [ d, e, f ]
}
type::{
name: combined_values,
any_of: [
some_values,
more_values,
]
}
I think that a similar concept could be useful for ordered_elements
, but because lists/sexps/documents are "keyed" by index, it isn't a problem in quite the same way. You can't have an element number 3 unless you already have elements 0, 1, and 2; and you can guarantee that if there is no element 4, then there is no element with an index greater than 4. Which means, you can easily do something like this:
$ion_schema_2_0
// This type is functionally like a "begins with" constraint on a sequence
type::{
name: some_ordered_elements,
ordered_elements: [
string,
int,
{ occurs: range::[0, max] }
]
}
// Allows the beginning of the sequence specified in `some_ordered_elements` followed by a single timestamp.
type::{
name: more_ordered_elements,
type: some_ordered_elements
ordered_elements: [
{ occurs: 2, type: $any },
timestamp.
]
}
``
- The
fields_of
syntax has a list of types –[ foo, bar ]
- For each type, get the
fields
constraint(s) –[ { a: int, b: string, }, { c: int, d: string, } ]
- For each
fields
constraint, get the field names defined in that struct. –[ [a, b], [c, d] ]
- Flatten & deduplicate the list —
[ a, b, c, d ]
- Use it to create a list of valid field names — equivalent to
field_names: { valid_values: [a, b, c, d] }
Thanks for the clarification, so it only creates a field_names
constraint, what about the value constraints? What if there is a shared field z
, such that the type constraints are incompatible like [{z: int}, {z:string}]
? Is this ignored because we are just prototyping the field_names
constraint?
Thanks for the clarification, so it only creates a field_names constraint, what about the value constraints?
Value constraints would need to be part of the fields
constraint, or "inherited" using type
, all_of
, etc. This is trivially done. E.g.:
type::{
name: foo,
fields: { a: int, b: string, }
}
type::{
name: bar,
// "Inherits" all of the field value constraints from `foo`.
type: foo,
// Adds some new field value constraints.
fields: { c: int, d: string, },
// Causes the type to be closed to any fields not specified in `foo` or `bar`.
field_names: fields_of::[ foo, bar ],
}
type::{
name: derived_from_closed_struct_type,
type: bar,
// This is useless—field names `e` and `f` are not permitted by type `bar`
fields: { e: int, f: int },
}
Now, if foo
had fields: closed::{ a: int, b: string, }
, then you could not "add" fields in bar
.
Another way to think of this is that closed::
is a shortcut for field_names: fields_of::[ <THIS TYPE> ]
.
What if there is a shared field z, such that the type constraints are incompatible like [{z: int}, {z:string}]? Is this ignored because we are just prototyping the field_names constraint?
That's correct, and the schema author can choose to handle it however they like using the existing logical constraints. E.g.:
type::{
name: some_ints,
fields: { z: int, x: int }
}
type::{
name: some_strings,
fields: { z: string, y: string }
}
type::{
name: no_z,
// Because of the conflicting field value definitions, z may not be present.
all_of: [some_ints, some_strings],
field_names: fields_of::[some_ints, some_strings],
}
type::{
name: any_z,
// Z may be an int or string
any_of: [some_ints, some_strings],
field_names: fields_of::[some_ints, some_strings],
}
Okay, I think this is fine then, the "prototyping" is explicit for one kind of constraint and only that kind. We are free to add different ones in the future.
The confusing part is that the type is inherited, but what does that inheritance do?
Right now it inherits the field's constraints, but not the other map constraints, which is strange to me.
Also, it seems weird to close the type from extension. When I say it's closed, I'm thinking the map is closed. So if a subtype inherits the fields, and adds more, I'd expect it to be closed to the union of fields.
So here:
type::{
name: derived_from_closed_struct_type,
type: bar,
// This is useless—field names `e` and `f` are not permitted by type `bar`
fields: { e: int, f: int },
}
I would think that e and f are allowed along with the other inherited types, but that the type is closed from having more fields.
Our personal use-case, to put more context, is that we want all fields on our API input to be closed to catch typos as we insert things in our DB. All our types inherits from a few base types that has common fields like createdDate and all that.
Currently this is impossible, if you close the base type, all the derived type that define additional fields error. If you close the derived type, it errors because the base type adds fields as well.
What was surprising to us was that the fields constraints inherits and merge, but the content closed didn't inherit.
The fields_of makes sense to me, but what doesn't is the current inheritance, because I see it similarly to a "fields_of". When we say foo is of type bar, it means it has the fields of bar no? Or what's the distinction?
What was surprising to us was that the fields constraints inherits and merge, but the content closed didn't inherit. ... When we say foo is of type bar, it means it has the fields of bar no? Or what's the distinction?
Ion Schema has no inheritance or abstract types—only composition—and the thing we're composing is rules, rather than members of a class. When we say that foo
is of type bar
that means that a value must be bar
(i.e. meet all of the constraints of bar
) in order to be a foo
.
We have a usecase where we want to federate the schema creation. As a system owner, we define a base type and our multiple clients can define their custom type extending with additional fields and constraints. And the data for each clients should satisfy the constraints of their custom type as well as base type. When a new client creates a new custom type we shouldn't have to update our base type.
From my perspective, I find the alternate approach more intuitive.
Here's an example of a product listing for tires to make the federation use case more concrete.
tire_listing::{
product_name: "performance tire",
price: USD::188.50,
vehicle_type: P,
width: 225,
aspect_ratio: 70,
construction: R,
rim_diameter: 16,
load_index: 91,
speed_rating: S,
min_inflation_psi: 30,
max_inflation_psi: 38,
tread_depth_mm: 12,
}
In this case, you might have a base type (product_listing
) and a derived type (tire_listing
).
type::{
name: product_listing,
fields: {
product_name: string,
price: currency_amount,
// and others...
}
}
type::{
name: tire_listing,
type: product_listing,
fields: {
// Unfortunately, we have to repeat the fields from the base type if we want to close this struct.
product_name: any,
price: any,
// New fields
vehicle_type: symbol,
width: int,
aspect_ratio: int,
construction: symbol,
rim_diameter: int,
load_index: int,
speed_rating: symbol,
min_inflation_psi: int,
max_inflation_psi: int,
tread_depth_mm: int,
}
}
It is probably cleaner to create a federated model using composition, such as this example, but in cases where the schema is being defined after a federated system is already running production workloads, you may not have the flexibility to model the data in an optimal way because of the cost of migrating existing data.
type::{
name: federated_product_listing,
fields: closed::{
product_name: string,
price: currency_amount,
federated_details: list, // of "listing details"
}
}
type::{
name: tire_listing_details,
annotations: closed::required::["tire_details"]
fields: {
vehicle_type: symbol,
width: int,
aspect_ratio: int,
construction: symbol,
rim_diameter: int,
load_index: int,
speed_rating: symbol,
min_inflation_psi: int,
max_inflation_psi: int,
tread_depth_mm: int,
}
}
What was surprising to us was that the fields constraints inherits and merge, but the content closed didn't inherit. ... When we say foo is of type bar, it means it has the fields of bar no? Or what's the distinction?
Ion Schema has no inheritance or abstract types—only composition—and the thing we're composing is rules, rather than members of a class. When we say that
foo
is of typebar
that means that a value must bebar
(i.e. meet all of the constraints ofbar
) in order to be afoo
.
I understand that now, so it's similar to all_of, it was confusing because it uses inheritance-like syntax.
We ended up implementing custom inheritance ourself. We simply copy rules from the parent into the child, and then the effective type becomes just the new type which has the rules of everything in the chain combined.
This doesn't fail if the child wants to close fields, because it doesn't need to be independently valid as all_of does, it simply need to remain valid when merging all rules together.
Maybe what I would propose is a union_of
constraints? As opposed to all_of
it first unions the constraints together, and then validates if the value is valid to their union. Where-as all_of
validates one type after the other and expect the value to be valid to all of them.
type::{
name: foo,
type: struct,
fields: {
a: int,
b: string,
}
}
type::{
name: bar,
type: struct,
fields: {
c: int,
d: string,
},
content: closed,
}
type::{
name: foobar,
union_of: [
foo,
bar
]
}
Where foobar here is the same as just copy/pasting their constraints together:
type::{
name: foobar,
fields: {
a: int,
b: string,
c: int,
d: string,
}
content: closed,
}
Now foobar
is the union of both constraints. It's simply a more constrained type.
This isn't same as inheritance, because you don't need to handle merge conflicts, you could just duplicate. For example:
type::{
name: foo,
type: struct,
fields: {
a: int,
}
}
type::{
name: bar,
type: struct,
fields: {
a: string
},
content: closed,
}
type::{
name: foobar,
union_of: [
foo,
bar
]
}
Would be same as:
type::{
name: bar,
type: struct,
fields: {
a: int,
a: string,
},
content: closed,
}
This isn't a valid syntax I don't think, but it's for demonstration purpose of the behavior, so foobar
just states that the constraints are unioned, so it has a key a
that must be a valid int and a valid string, this won't ever validate obviously, but that's the difference with inheritance, so you don't need logic about ordering of precedence when merging the types.
Maybe that would satisfy our use-case of combining fields and letting the content: closed behavior close over the union of fields, but without having to introduce real inheritance?
@didibus A union_of
constraint is certainly an interesting idea. I think it has potential as long as we can define what it means in a consistent way. I think that you've presented something that will work for merging fields
constraints, and I can see how it might work for a constraint like valid_values
, but I'm not sure how it would work with other constraints such as ordered_elements
or codepoint_length
.
If you're just suggesting it for fields
then I think it's functionally similar to the proposal I made in this prior comment (even though the syntax may be different), and I think some variation of this approach is probably going to be the most feasible solution. If you have more ideas about how it can be generalized for all constraints, maybe you could go ahead and create a new issue for that idea?
@popematt You're right I didn't think about all constraints yet, so not sure if there is a nice way to tackle them all. I'll put some thoughts into it.
One thing maybe I missed from your proposal, can you enforce the closeness?
One thing that was important for us, is that if I close in any one of them, it should be closed, but its the union of the fields that should be closed.
In my proposal, the "closed-ness" across multiple type definitions is enforced by the field_names
constraint. For example, a type with field_names: fields_of::[a, b]
would be closed to only include the fields declared in type a
and type b
.
type::{
name: a,
fields: {
a1: int,
a2: string,
}
}
type::{
name: b,
type: a,
field_names: fields_of::[a, b], // Values of type 'b' can only have the fields `a1`, `a2`, and `b1`.
fields: {
b1: int
}
}
However, when using this approach, the fields
constraints should not be closed::
because Ion Schema never removes or relaxes any constraints when creating a derived type. For example:
// Assume type `a` is the same as the last example
type::{
name: b,
type: a,
field_names: fields_of::[a, b], // Type `b` can only have fields `a1`, `a2`, and `b1`...
fields: closed::{ // but `fields` is closed, so only `b1` is allowed.
b1: int
}
}
Originally posted by @didibus in https://github.com/amazon-ion/ion-schema/issues/47#issuecomment-1526090678
Ion Schema does not have any OOP-style inheritance because constraints can only subtract from the set of values that are allowed for a given type definition. That being said, it is inconvenient to have to copy/paste fields from one type definition to another, and it is a maintainability issue to have to manually keep the fields of multiple type definitions in sync.
Any solution to this problem should consider whether the solution would make it easier or harder to generate code (e.g. POJOs or Ion 1.1 macros) for a type definition.
An Idea
Here is a strawman proposal of something that could work without needing to introduce any OOP-style inheritance into Ion Schema.
Some open questions with this approach:
foo
isclosed::
? We could say that "closed" is a modifier, not content, of thefields
constraint, but that might be a confusing behavior if you want to "inherit" the fields and their "closed-ness". On the other hand, if you want to "inherit" from a type with closed fields, you can just usetype: some_type_with_closed_fields
.foo
has more than onefields
constraint? We would probably want to grab all of them.foo
has nofields
constraint? Would this be an error or a no-op?all_of
, but how would we handle theoccurs
?union
withoutclosed
? Maybeunion
should imply that the fields are closed, and thenclosed
andunion
would be mutually exclusive.An alternate approach
Instead of using
fields
to create unions of field definitions, we could create a special syntax for thefield_names
constraint since you can close the fields usingfield_names
. E.g.:In this case, the
fields_of
annotation is a flag that indicates that the Ion Schema System should automatically generate an inline type based on the fields of the listed types. I.e.{ valid_values: [ a, b, c, d ] }
.This seems like an elegant solution to me, especially since it avoids the question of how to combine the
occurs
or type definitions of each field.However, this does seem like it might violate the tenet of having orthogonal constraints. Tenets are made to serve us, not us serve the tenets, but it is a good indicator that we need to be careful. There's also the unanswered question of how an anonymous/inline type could refer to its own
fields
constraint(s).