Open marcosmro opened 8 months ago
@egyedia @muakdogan As discussed last week, I've entered a short description for this task and moved it to the top of your In-Progress column. Please, start working on it and report status on Wednesday. Thanks!
These are my thoughts as of now:
Triggering conditions: a draft artifact is changed a draft artifact is published a published artifact gets a new draft version an artifact which has a previous version embedded in another artifact gets updated
If a field is updated, => 1) Offer to update all elements that embed the field all templates that embed the field => 2) Offer to update all templates that embed the elements in step 1) => 3) Offer to update all instances that are derived from the templates updated in 1) or 2)
If an element is updated => 1) Offer to update all templates that embed the element => 2) Offer to update all instances that are derived from the templates updated in 1)
If a template is updated => 1) Offer to update all instances that are derived from the template
Only the artifacts that the user owns are shown for update
There should be an overview screen, either a multipage wizard like, or a long scrolling page. This summarizes the artifacts that were changed, the actions that are taken on them, and the other artifacts that are/could be affected by these changes
An artifact which is candidate for update, has three options: Update Keep as is Postpone decision
The overview page actualizes it's state based on decisions taken at every point
The instances can be rendered invalid after such a change. Or they can loose data.
The wizard should do it's best to align the instances with the new template. Should show the lost data Should show data that is mandatory in the new template, but it is empty in the instance
These instances should be collected into a work package, a collection that the user later can access. The user should be able to take the instances in such a collection, and fix them manually. This behaves like a conflicting merge in git. After solving one instance, the user marks it as solved. This removes it from the work package. The user can mark it as solved even if the instance does not validate.
In addition to Atti some thoughts of mine:
What if a template is shared with others and they have metadata instances created from that template? If shared people and original template owner diverge in the update decision?
Is deletion of the field or element should be considered as an update and the template update be triggered?
In cases of having too many instances and/or templates that will be affected by the update, we should provide a "bulk updating" option like "apply to all".
It maybe good to categorize the initial update like destructive (will end up data lose -> deleting a field, max characters 60 to 4, etc.0 ) / nondestructive OR relaxing/tightening and provide downstream options accordingly.
This is a plan to create the graph changes that are needed to implement the functionality:
A new relationship, INCLUDES
is introduced that links the including artifact to the included artifact.
An alternative name is possible for the relationship, we have these with similar meaning:
(t:Template)-[INCLUDES]->(ei:Element)
(t:Template)-[INCLUDES]->(fi:Field)
(e:Element)-[INCLUDES]->(ei:Element) (e:Element)-[INCLUDES]->(fi:Field)
## Keeping the graph up-to-date
Every time a template or element is saved, the artifacts included at the first level are summarized, and the outgoing arcs from the current artifact are updated. This means deleting or creating arcs.
## Incremental Neo4j update task
It is a possibility to run the `INCLUDES` arc consolidation script from time to time. This would be performing the same operations as the initial run, except it would also delete arcs if necessary. This is not a must, but it can fix missing or extra arcs if for some reason the graph's integrity deteriorates. The source of truth for the `INCLUDES` arc should be considered to lie in the Mongo content. This code being a variation of the initial run, it is not a big effort to create. However, the scheduling of this task should be discussed, we do not have a unified scheduler at the moment.
Consider using two different relationships INCLUDES_ELEMENT and INCLUDES_FIELD instead of a single INCLUDES relationship. A single relationship might be the right approach, since it's easier to implement and maintain, and handles easily the use case "Retrieve all nested artifacts." However, if there's potential for the meanings or attributes of INCLUDES_ELEMENT and INCLUDES_FIELD to diverge in the future (I don't think so), different relationships would be a better idea.
@matthewhorridge @martinjoconnor any thoughts?
I'd like to propose prioritizing the implementation of this comment that I made on the last meeting (first bullet point) before we proceed with deploying the updating components change feature. The rationale behind this suggestion is to ensure that we establish a proper metadata acquisition workflow from the beginning thus minimizing the potential disruptions on instances when template's components are updated.
What do you think?
Proposed solution for the UI and REST endpoint.
First request right after the update of an artifact (field or element) was successful:
POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-preview
{
"@id": "id of the field or element that was updated"
}
Response:
{
"@id": "value",
"elements": {
"id1": {
"@id": "id1",
"@type": "https://schema.metadatacenter.org/core/TemplateElement",
"schema:name": "Name 1",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"operation": "DO-NOT-UPDATE"
},
"id2": {
"@id": "id2",
"@type": "https://schema.metadatacenter.org/core/TemplateElement",
"schema:name": "Name 2",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"operation": "DO-NOT-UPDATE"
}
},
"transitive-elements": {
}
"templates": {
"id3": {
"@id": "id3",
"@type": "https://schema.metadatacenter.org/core/TemplateElement",
"schema:name": "Name 3",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"operation": "DO-NOT-UPDATE"
}
}
"transitive-templates": {
}
}
The user is presented with two grids of elements to be updated, and two grids of templates to be updated. If the artifact updated was a field: The element grid has rows in it, initially with the "DO-NOT-UPDATE" action selected. The transitive-element grid is empty, later it can have values if the action is changed in the first grid The template grid has rows in it, with the templates that directly include the field. The transitive-template grid first is empty, later it contains the templates that can be updated because they transitively include the field that was updated.
If the artifacts was an element: The element grid shows other elements that include this given element The transitive-element grid is empty, later it can have values if the action is changed in the first grid The template grid shows the templates that include this element The transitive-template grid first is empty, later it contains the templates that can be updated because they transitively include the element that was updated.
Once the user made changes in the tables, the view can be updated (either automatically, or with a "Refresh" button)
The request should be:
POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-preview
{
"@id": "value",
"elements": {
"id1": {
"@id": "id1",
"operation": "DO-NOT-UPDATE"
},
"id2": {
"@id": "id2",
"operation": "UPDATE"
}
},
"transitive-elements": {
}
"templates": {
"id3": {
"@id": "id3",
"operation": "UPDATE"
}
}
"transitive-templates": {
}
}
This is practically the same data that was returned, with the operations changed. The whole data structure can be sent, but only the "@id" and the "operation" fields will be taken into account.
The result will be the same structure with a different number of entries in each four blocks, depending on the choices made in the previous step.
Once the user is content with the setup, they can click on the "Apply Changes" / "Propagate Changes" button.
A call should be made to:
POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-update
with the same datasatructure as before, only the "@id" and "operation" fields matter.
At this stage we only handle the propagation of changes to artifacts that the current user (the one making the original modification) has write access. At this stage if the changed artifact is included in an element or template that the user has no write access to, the change will never be applied to the "foreign" artifact.
In a later stage we will update the model so that other users can update their artifacts as well as a result of a change.
Consider using two different relationships INCLUDES_ELEMENT and INCLUDES_FIELD instead of a single INCLUDES relationship. A single relationship might be the right approach, since it's easier to implement and maintain, and handles easily the use case "Retrieve all nested artifacts." However, if there's potential for the meanings or attributes of INCLUDES_ELEMENT and INCLUDES_FIELD to diverge in the future (I don't think so), different relationships would be a better idea.
@matthewhorridge @martinjoconnor any thoughts?
There are two ways:
[includer:Element]-[INCLUDES]->[included:Field]
or
[includer:Element]-[INCLUDES_FIELD]->[included:Field]
The second case is somewhat redundant, because from the two endpoints of the arc the arc type can be deduced.
If we use two types of arcs (INCLUDES_FIELD
, INCLUDES_ELEMENT
) we need to write queries which allow both types of relationships if we want to query for a full inclusion graph:
(s)-[INCLUDES_FIELD|INCLUDES_ELEMENT]->(t)
At this point I am not sure which way is better.
For simplicity I will go with the single relationship type for now, but this seems like a two-way door (in Amazon terms) so we can change it later if if turns out that the other solution is better.
bibo:status
into the summary as well.
In the template grid the number of instances for each template to be updated will be shown. This will be retrieved and cached by the front-end.depth
attribute in the array elements, showing the distance of the affected artifact from the originator.POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-preview
{
"@id": "value"
}
{
"@id": "value",
"elements": [
{
"@id": "id1",
"@type": "https://schema.metadatacenter.org/core/TemplateElement",
"schema:name": "Name 1",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"bibo:status": "bibo:draft",
"depth": 1,
"operation": "DO-NOT-UPDATE"
},
{
"@id": "id2",
"@type": "https://schema.metadatacenter.org/core/TemplateElement",
"schema:name": "Name 2",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"bibo:status": "bibo:draft",
"depth": 1,
"operation": "DO-NOT-UPDATE"
}
],
"templates": [
{
"@id": "id3",
"@type": "https://schema.metadatacenter.org/core/Template",
"schema:name": "Name 3",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"bibo:status": "bibo:draft",
"depth": 1,
"operation": "DO-NOT-UPDATE"
}
]
}
If the user chooses to update id1
POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-preview
{
"@id": "value",
"elements": [
{
"@id": "id1",
"operation": "UPDATE"
},
{
"@id": "id2",
"operation": "DO-NOT-UPDATE"
}
],
"templates": [
{
"@id": "id3",
"operation": "DO-NOT-UPDATE"
}
]
}
Please note the element id-1-1
brought in by changing the operation
on id1
{
"@id": "value",
"elements": [
{
"@id": "id1",
"@type": "https://schema.metadatacenter.org/core/TemplateElement",
"schema:name": "Name 1",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"bibo:status": "bibo:draft",
"depth": 1,
"operation": "UPDATE"
},
{
"@id": "id1-1",
"@type": "https://schema.metadatacenter.org/core/TemplateElement",
"schema:name": "Name 1-1",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"bibo:status": "bibo:draft",
"depth": 2,
"operation": "DO-NOT-UPDATE"
},
{
"@id": "id2",
"@type": "https://schema.metadatacenter.org/core/TemplateElement",
"schema:name": "Name 2",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"bibo:status": "bibo:draft",
"depth": 1,
"operation": "DO-NOT-UPDATE"
}
},
"templates": [
{
"@id": "id3",
"@type": "https://schema.metadatacenter.org/core/Template",
"schema:name": "Name 3",
"schema:description": "Description",
"pav:createdOn": "2023-11-14T14:01:13-08:00",
"pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
"bibo:status": "bibo:draft",
"depth": 1,
"operation": "DO-NOT-UPDATE"
}
]
}
The above flow only previews what would happen if the user was to commit the changes. No real changes are applied to the graph or artifacts in the inclusions-subgraph-preview
mode. In order to actually apply the changes, a last call is needed:
POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-update
The body should be the same structure as in the second request.
This flow is stateless, the back-end does not keep track of any state, it always computes the affected artifacts based on the request body.
At this implementation phase no change detection is in place. After applying changes the whole process could be restarted from the top, and it would result in a similar flow.
List of things we agreed on after out meeting with @egyedia:
UI will provide this data/functionality but not necessarily with exact look & feel.
Thanks @muakdogan, please consider the following ideas:
Consuming Custom Elements (Web components with AngularJS)
@egyedia Could you please update the end date for this task with a reasonable new estimate?
We are currently working on this task. To better understand our progress, I am outlining the backend subtasks here:
This issue also needs a frontend component, which presents the REST API responses in a human-interpretable way. This part is developed by @muakdogan , I will let him report the progress of that subtask.
The update REST endpoint is done, and ready for integration. An example to call it is:
POST https://resource.metadatacenter.orgx/command/inclusions-subgraph-update
{
"elements": {
"https://repo.metadatacenter.orgx/template-elements/166db1a0-2a5c-445a-8e42-804f32eac302": {
"operation": "update",
"elements": {
"https://repo.metadatacenter.orgx/template-elements/92c1b814-09e9-4360-8ba3-819ffc7e2eb5": {
"operation": "update",
"elements": {
"https://repo.metadatacenter.orgx/template-elements/a239f0df-1f3a-478f-b157-1b51312e6280": {
"operation": "update",
"templates": {
"https://repo.metadatacenter.orgx/templates/3ec70c91-96d7-4173-8e30-433a197b1e23": {
"operation": "update"
}
}
}
}
}
}
}
},
"@id": "https://repo.metadatacenter.orgx/template-fields/1cda2e5b-4edf-4e23-8931-6d4521e523ef"
}
@muakdogan let me know if you run into any issues during the frontend integration!
RE: Apply changes UI/UX
I suggest listing all the artifacts with the updated 'field1' and adding a checkmark next to each one. This will allow users to select which artifacts need the updated version (see image below).
If users select an element item where it is being used by other artifacts, the system automatically selects the associated elements and templates. Users cannot modify these auto-selected items such that the checkboxes are disabled (see image below).
However, in my opinion, the vice-versa should be different: if users select the template item first, it is not necessary for the associated element to receive the update.
With respect to selectively pushing a field update, I think this can cause some serious consistency issues down the line after just a few updates.
From the perspective where a field1
has a unique identifier (be it a URL or database entry or whatever), updating field1
in one location of a template and not in another location seems like abuse of field1
. That seems like the user should have been using a separate field in the location where the update is not desired. This is a perspective that would favor pushing the change everywhere or referencing a singleton field1
, but this may not be desirable either because the user might see changes to field1
in places that were not intended to be updated.
I thought that strict versioning might be able to mitigate this problem, so that you could have field1 v1
where it is not updated and field1 v2
where the update has been pushed, but this would break semantic versioning if the user then wants to make a change to field1 v1
after already having made field1 v2
previously. This led me to yet another idea of cloning field1
(so that it becomes a separate entity) on update if the user does not want the change propagated everywhere. This wouldn't solve the issue of a bifurcating field1
, but it would make the update process transparent and avoid causing a potential consistency problem with field1
. Propagating this process up through containing elements and the template could be computationally challenging if elements are also abused in the same way.
We installed the feature branch onto our staging server to test it out. During these tests a frontend bug in the old template editor was found, which in some cases (changing the name or description of a field) results in the update feature to seem that it is not working. I started working on fixing this, the issue is reported here: https://github.com/metadatacenter/cedar-template-editor/issues/994
Design and implement a CEDAR feature to update metadata templates in response to modifications in their embedded template elements or fields. This work involves both backend and frontend effort.
Suggested workflow:
This issue focuses on the most basic version of this task.
@egyedia: Primary (Backend) @muakdogan: Primary (Frontend) @martinjoconnor: Reviewer @matthewhorridge: Reviewer