Open theduke opened 1 year ago
I always felt like this was bound to come up at some point. I think you're right, we probably need an Array datatype.
I think that if a Property has the Array
datatype, it should also indicate which types of elements are supported. Maybe it has a second datatype, namely innerDatatype
, which refers to the shape of the items in the array (e.g. String or Integer).
This brings up an interesting modeling problem.
How do you express "array of integers" in the schema?
This is actually the more general problem of "how to refine types".
I see several solutions, all of them with downsides.
A property of type atomicdata.dev/datatypes/array
could use a atomicdata.dev/properties/array-item-type
property to specify the expected type of array items.
The big downside here is that it would not be apparent from the schema that this property is expected or required as a refinement of the array datatype, so that makes the schema more cryptic and implementations more complicated.
It's also more complex to "unify" and compare schema types, since libraries now need to understand that the array-item-type
property, and convert those into an Array<T>
type for processing.
Have something like a ../classes/ArrayType
class, which requires the array-item-type
property.
Properties can then specify their type (usually with a nested resource, probably) as an ArrayType
.
The downside here is that libraries now have to understand what an ArrayType
means, and need code to unify different ArrayType
definitions into a Array<T>
type for things like queries, filters, etc.(as above)
In my factordb
implementation I went in a somewhat different direction.
I don't allow defining arbitrary datatypes. Types have to be expressed in terms of the built-in core type system.
A simplified definition of the core types in Rust looks a bit like this:
pub enum ValueType {
Const(Value),
Any,
Unit,
Bool,
Int {
min: Option<i64>,
max: Option<i64>,
},
UInt {
min: Option<u64>,
max: Option<u64>,
},
Float {
min: Option<f64>,
max: Option<f64>,
},
String {
min_length: Option<u64>,
max_length: Option<u64>,
regex_validators: Option<Vec<String>>,
},
Bytes {
min_length: Option<u64>,
max_length: Option<u64>,
},
// Containers.
List {
item_type: Box<Self>,
min_length: Option<u64>,
max_length: Option<u64>,
},
/// A mapping from keys to values
Map {
key_type: Box<Self>,
value_type: Box<Self>,
},
///
Object(ObjectType),
/// An anonymous union of different types.
Union(Vec<Self>),
/// Tagged union (aka sum type / ADT)
Variant(VariantType),
/// Reference (aka foreign key) pointing to another entity
Reference {
/// Restrict the allowed entity types.
allowed_types: Option<HashSet<Ident>>,
},
/// A custom data type.
Named(Ident),
}
Properties can either specify a concrete ValueType
as their type (serialized as a nested object), or a custom datatype, but custom datatype entities essentially only provide a named definition for a specific ValueType
.
The main advantage here is that clients will always be able to understand and work with all data.
More complex types can always be expressed in terms of this core schema, and worst case they can just use a bytes
array or string for arbitrary serialization.
(including things like ObjectType
or Map
here is probably very debatable because it is hard to express in something like a triple/quad format, and might be better expressed with something like nested resources, but I don't have that yet)
There currently only is a
resource-array
datatype, which requires using nested resources if there are multiple values.Often I would want to have a property with multiple plain values though.
Reasons:
Image you want an array of ints or strings.
So there should be a datatype for "array of type T".
Defining the nested type would run into similar issues as #126 though.