Open asishallab opened 1 year ago
Request for uniforming the way relationships/associations are implemented.
Almost all schemes use a uniform format for the associations, for example observationVariables
from Study.json:
{
"$defs": {
"Study": {
"properties": {
"observationVariables": {
"description": "The list of Observation Variables being used in this study. \n\nThis list is intended to be the wishlist of variables to collect in this study. It may or may not match the set of variables used in the collected observation records. ",
"items": {
"$ref": "ObservationVariable.json#/$defs/ObservationVariable"
},
"referencedAttribute": "studies",
"relationshipType": "many-to-many",
"type": "array"
}
}
},
"$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Core/Study.json",
"$schema": "http://json-schema.org/draft/2020-12/schema"
}
As you can see the the association is a property itself therefore there is no problem in automatic converting the relationships. While working on associations I noticed that in all 3 associations are defined differently than all others, these are:
parentGermplasm
in PedigreeNode.jsonprogenyGermplasm
in PedigreeNode.jsonsiblingGermplasm
in PedigreeNode.jsonAll of these three associations are defined differently than the others.
They are not defined as own property but rather as a nested property, as example: parentGermplasm
:
{
"$defs": {
"PedigreeNode": {
"properties": {
"parents": {
"description": "A list of parent germplasm references in the pedigree tree for this germplasm. These represent edges in the tree, connecting to other nodes.\n<br/> Typically, this array should only have one parent (clonal or self) or two parents (cross). In some special cases, there may be more parents, usually when the exact parent is not known. \n<br/> If the parameter 'includeParents' is set to false, then this array should be empty, null, or not present in the response.",
"items": {
"properties": {
"parentGermplasm": {
"$ref": "Germplasm.json#/$defs/Germplasm",
"description": "The ID which uniquely identifies a parent germplasm",
"referencedAttribute": "progenyPedigreeNodes",
"relationshipType": "many-to-one"
},
"parentType": {
"description": "The type of parent used during crossing. Accepted values for this field are 'MALE', 'FEMALE', 'SELF', 'POPULATION', and 'CLONAL'. \n\nIn a pedigree record, the 'parentType' describes each parent of a particular germplasm. \n\nIn a progeny record, the 'parentType' is used to describe how this germplasm was crossed to generate a particular progeny. \nFor example, given a record for germplasm A, having a progeny B and C. The 'parentType' field for progeny B item refers \nto the 'parentType' of A toward B. The 'parentType' field for progeny C item refers to the 'parentType' of A toward C.\nIn this way, A could be a male parent to B, but a female parent to C. ",
"enum": [
"MALE",
"FEMALE",
"SELF",
"POPULATION",
"CLONAL"
],
"type": "string"
}
},
"required": [
"germplasmDbId",
"parentType"
],
"type": "object"
},
"type": [
"null",
"array"
]
}
}
},
"$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/PedigreeNode.json",
"$schema": "http://json-schema.org/draft/2020-12/schema"
}
As you can the the association is here defined as a nested property of the property parents
.
The converter ignores nested properties therefore is the association also ignored.
Is there a possibility to uniform the format of the associations and define them as individual property? This would be very helpful!
The siblingsGermplasm
is something I can fix immediately. But parentGermplasm
and progenyGermplasm
are a little bit tricky. While they are referencing an array of Germplasm elements, they also need the additional metadata parentType
associated with each Germplasm. I think we need some kind of polymorphism for the Germplasm entity in this case.
I think the model proposed in this Blog post might work for us: https://json-schema.org/blog/posts/modelling-inheritance
It would look something like this:
{
"$defs": {
"PedigreeNode": {
"properties": {
"parents": {
"description": "A list of parent germplasm referen...,",
"referencedAttribute": "progenyPedigreeNodes",
"relationshipType": "many-to-one",
"items": {
"type": "object",
"$ref": "Germplasm.json#/$defs/Germplasm",
"properties": {
"parentType": {
"description": "The type of parent used du... ",
"enum": ["MALE", "FEMALE","SELF", "POPULATION", "CLONAL" ],
"type": "string"
}
},
"required": ["germplasmDbId","parentType"],
},
"type": ["null","array"]
}
}
},
"$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/PedigreeNode.json",
"$schema": "http://json-schema.org/draft/2020-12/schema"
}
@LzLang Will this work for Zendro? Will it be able to pick up the reference to Germplasm
AND keep the additional property parentType
? I don't know how polymorphism works with GraphQL...
So, any one-to-one or one-to-many relation to objects that do not have a separate data model definitions we dub "nested". It'd be helpful to discontinue usage of such nested relationships and rather have separate JSON data model definitions for those and then define the relationships as in all cases.
Some data models have foreign keys stated, which should be excluded from the "standard" data model definition. @LzLang will provide us with a list of these keys in order to remove them from the JSON model definitions.
Note that currently in the context of automated data warehouse generation with Zendro, we automatically create foreign keys for each association.
In Zendro with only support single foreign keys, of course we could have one for the mother germplasm id and another one for the father. This would be a solution everywhere where we know how many associations we have to the same data model.
Hello @BrapiCoordinatorSelby ,
we worked on the nested properties issue and tried to separate those into different/there own models. We used your Cross.json schema and modified it. Could you please review the idea and tell us your opinion? Basically we have to modify the schema manually.
Cross.json now (condensed to the changed attributes):
{
"$defs": {
"Cross": {
"properties": {
"crossAttributes": {
"referencedAttribute": "cross",
"relationshipType": "one-to-many",
"items": {
"$ref": "CrossAttribute.json#/$defs/CrossAttribute",
"description": "Set of custom attributes associated with a cross"
},
"type": [
"null",
"array"
]
},
"externalReferences": {
"referencedAttribute": "cross",
"relationshipType": "one-to-many",
"items": {
"$ref": "CrossExternalReferences.json#/$defs/CrossExternalReferences",
"description": "An array of external reference ids. These are references to this piece of data in an external system. Could be a simple string or a URI."
},
"type": [
"null",
"array"
]
},
"parent1": {
"$ref": "Germplasm.json#/$defs/Germplasm",
"description": "the unique identifier for a germplasm",
"referencedAttribute": "parent1Childs",
"relationshipType": "many-to-one"
},
"parent2": {
"$ref": "Germplasm.json#/$defs/Germplasm",
"description": "the unique identifier for a germplasm",
"referencedAttribute": "parent2Childs",
"relationshipType": "many-to-one"
},
"pollinationEvents": {
"referencedAttribute": "cross",
"relationshipType": "one-to-many",
"items": {
"$ref": "CrossPollinationEvent.json#/$defs/CrossPollinationEvent",
"description": "The list of pollination events that occurred for this cross"
},
"type": [
"null",
"array"
]
}
},
"required": [
"crossDbId"
],
"title": "Cross",
"type": "object"
}
},
"$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/Cross.json",
"$schema": "http://json-schema.org/draft/2020-12/schema"
}
We created the following models:
CrossAttributes:
{
"$defs": {
"CrossAttribute": {
"properties": {
"cross_attribute_ID": {
"description": "the unique identifier for a cross attribute",
"type": "string"
},
"crossAttributeName": {
"description": "the human readable name of a cross attribute",
"type": [
"null",
"string"
]
},
"crossAttributeValue": {
"description": "the value of a cross attribute",
"type": [
"null",
"string"
]
},
"cross": {
"$ref": "Cross.json#/$defs/Cross",
"description": "The unique identifier for a Cross",
"referencedAttribute": "crossAttributes",
"relationshipType": "many-to-one"
}
},
"required": [
"cross_attribute_ID"
],
"title": "CrossAttribute",
"type": "object"
}
},
"$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/CrossAttribute.json",
"$schema": "http://json-schema.org/draft/2020-12/schema"
}
CrossExternalReferences
{
"$defs": {
"CrossExternalReferences": {
"properties": {
"reference_ID": {
"description": "The external reference ID. Could be a simple string or a URI.",
"type": [
"null",
"string"
]
},
"referenceSource": {
"description": "An identifier for the source system or database of this reference",
"type": [
"null",
"string"
]
},
"cross": {
"$ref": "Cross.json#/$defs/Cross",
"description": "The unique identifier for a Cross",
"referencedAttribute": "externalReferences",
"relationshipType": "many-to-one"
}
},
"required": [
"reference_ID"
],
"title": "CrossExternalReferences",
"type": "object"
}
},
"$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/Cross.json",
"$schema": "http://json-schema.org/draft/2020-12/schema"
}
CrossPollinationEvent
{
"$defs": {
"CrossPollinationEvent": {
"properties": {
"pollination_ID": {
"description": "The unique identifier for this pollination event",
"type": [
"null",
"string"
]
},
"pollinationSuccessful": {
"description": "True if the pollination was successful",
"type": [
"null",
"boolean"
]
},
"pollinationTimeStamp": {
"description": "The timestamp when the pollination took place",
"format": "date-time",
"type": [
"null",
"string"
]
},
"cross": {
"$ref": "Cross.json#/$defs/Cross",
"description": "The unique identifier for a Cross",
"referencedAttribute": "pollinationEvents",
"relationshipType": "many-to-one"
}
},
"required": [
"pollination_ID"
],
"title": "CrossPollinationEvent",
"type": "object"
}
},
"$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/CrossPollinationEvent.json",
"$schema": "http://json-schema.org/draft/2020-12/schema"
}
In the original Cross model, there were 2 special nested properties "parent1" and "parent2". Those properties were basically just a association to Germplasm to link the parents. Instead of creating a separate model for those 2 properties, we just created an association to Germplasm.json Cross:
"parent1": {
"$ref": "Germplasm.json#/$defs/Germplasm",
"description": "the unique identifier for a germplasm",
"referencedAttribute": "parent1Childs",
"relationshipType": "many-to-one"
},
"parent2": {
"$ref": "Germplasm.json#/$defs/Germplasm",
"description": "the unique identifier for a germplasm",
"referencedAttribute": "parent2Childs",
"relationshipType": "many-to-one"
},
Germplasm.json
"parent1Childs": {
"title": "parent1Childs",
"description": "Childs of the germplasm",
"referencedAttribute": "parent1",
"relationshipType": "one-to-many",
"items": {
"$ref": "Cross.json#/$defs/Cross",
"description": "Crosses"
},
"type": [
"null",
"array"
]
},
"parent2Childs": {
"title": "parent2Childs",
"description": "Childs of the germplasm",
"referencedAttribute": "parent2",
"relationshipType": "one-to-many",
"items": {
"$ref": "Cross.json#/$defs/Cross",
"description": "Crosses"
},
"type": [
"null",
"array"
]
}
Currently primary and foreign keys are defined the same way, e.g. from Cross:
{
"$defs": {
"Cross": {
"properties": {
"crossDbId": {
"description": "the unique identifier for a cross",
"type": "string"
},
"parent1": {
"properties": {
"germplasmDbId": {
"description": "the unique identifier for a germplasm",
"type": [
"null",
"string"
]
}
},
"type": [
"null",
"object"
]
}
},
"required": [
"crossDbId"
],
"title": "Cross",
"type": "object"
}
},
"$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/Cross.json",
"$schema": "http://json-schema.org/draft/2020-12/schema"
}
The primary key crossDbId
and foreign key germplasmDbId
are defined the same way.
In our project we defined primary keys like [model]_ID
.
cross_ID
allele_matrix_ID
And for foreign keys we used a similar pattern, for example I use listOwnerPerson
from List.json
:
"listOwnerPerson": {
"$ref": "Person.json#/$defs/Person",
"description": "The unique identifier for a List Owner. (usually a user or person)",
"referencedAttribute": "lists",
"relationshipType": "many-to-one"
},
So basically one person can have multiple lists, in Zendro we would define the relationship like:
"listOwnerPerson": {
"type": "many_to_one",
"implementation": "foreignkeys",
"reverseAssociation": "lists",
"target": "Person",
"targetKey": "lists_ids",
"sourceKey": "list_owner_person_id",
"keysIn": "List",
"targetStorageType": "sql"
}
So our foreign keys are named after the attribute and uses id/ids, depending if it's an array or not.
Currently BrAPI is using two different ways to define associations.
X-to-many
always has the items
tag where a description and the reference is noted:
"observationUnits": {
"title": "observationUnits",
"description": "observationUnits",
"referencedAttribute": "cross",
"relationshipType": "one-to-many",
"items": {
"$ref": "ObservationUnit.json#/$defs/ObservationUnit",
"description": "ObservationUnit"
},
"type": [
"null",
"array"
]
}
On the other side many-to-X
don't have this nesting
"crossingProject": {
"$ref": "CrossingProject.json#/$defs/CrossingProject",
"description": "the unique identifier for a crossing project",
"referencedAttribute": "crosses",
"relationshipType": "many-to-one"
},
We don't see a benefit in nesting the reference and giving it a separate description. Basically you could define this relationship without nesting, like:
"observationUnits": {
"title": "observationUnits",
"description": "observationUnits",
"referencedAttribute": "cross",
"relationshipType": "one-to-many",
"$ref": "ObservationUnit.json#/$defs/ObservationUnit",
"type": [
"null",
"array"
]
}
Open Questions
When parsing and reading through the ISA JSON Model a few questions arose. They are listed here.
How to treat properties of type
object
In some cases BrApi JSON data models have properties of type
object
. We can model them in Zendro in a number of ways.one-to-many
association.Probably this should be decided on a case-by-case level?
Example:
additionalInfo
andadditionalProperties
e.g. inPerson.json
.Structure of
additionalInfo
The definition taken from
Person.json
says:So, according to this specification, a person can have additional info. But, what is the structure of this object? The object
additionalInfo
can have a number ofadditionalProperties
that are of typestring
?additionalProperties
a collection ofkey
andvalue
pairs that can store any information? In that case, we cannot provide a schema, but must use serialized JSON.Reply from meeting with the BrApi group
additionalInfo
should be the only case, where we see non formatted data. In the BrApi test server we serialize and store this object as JSON.How to model
externalReferences
?The array of external references is found in the Person model:
There are several question about this specification:
referenceId
andreferenceSource
are marked as required, but in theirtype
specificationnull
is allowed.Response from the BrApi development team
ExternalReference
referenceId
can be e.g. a DOI URLPage & Holmes 2012, Inferring Phylogenies
(actually this example does not exist)Another example would be the field-book-App:
field-book
Validation
Zendro has the capability to use any validation function on provided data. The Zendro framework can validate both data formats (syntactically) and data values (semantically). However, if the database has to be queried, we should consider whether this might be a performance bottleneck.
Example taken from
Sample.json
:Questions:
Relationships / Associations
For some associations we see the foreign keys implemented in the JSON Specs, e.g. in
Sample.json
:Here, we can conclude from the name of the foreign key and its existence:
many-to-[one|many
, i.e. many samples belong to probably many (not one) germplasm)Germplasm
).However a formal specification of all relationships would be extremely helpful and resolve open questions.
To be excluded properties
In some data models foreign keys are stated. Also, to spare the user to send another request to the RESTful API some of the properties of the associated (relationship) models are stored, too. See this example taken from
Sample.json
:Using GraphQL these properties are not required. GraphQL specifically allows to fetch within a single HTTP-Request all data the user wants, including properties of related (associated) data models. Furthermore, given we at some point have a formal description of relationships between data models, foreign keys would ideally no longer be listed among data model definitions. Is there a way, we can recognize these "to be excluded" properties and not include them in the final GraphQL data model definitions. An easy quick and dirty solution would be a simple exclusion list?
Response from the GraphQL development group
DbId
is either a primary or a foreign key.