Update metadata templates when their components change

bmir-radx / radx-project

This repo serves as a primary location for tracking issues that don't quite fit into our other dedicated repositories

0 stars 0 forks source link

Update metadata templates when their components change #26

Open marcosmro opened 8 months ago

marcosmro commented 8 months ago

Design and implement a CEDAR feature to update metadata templates in response to modifications in their embedded template elements or fields. This work involves both backend and frontend effort.

Suggested workflow:

User1 creates Template1, with a nested element Element1.
User2 makes a change in Element2.
User1 receives a notification (when?) that Element1 was updated, asking for confirmation to update Template1.
After User1's confirmation, Template1 is automatically updated with the new version of Element1.

This issue focuses on the most basic version of this task.

@egyedia: Primary (Backend) @muakdogan: Primary (Frontend) @martinjoconnor: Reviewer @matthewhorridge: Reviewer

marcosmro commented 8 months ago

@egyedia @muakdogan As discussed last week, I've entered a short description for this task and moved it to the top of your In-Progress column. Please, start working on it and report status on Wednesday. Thanks!

egyedia commented 8 months ago

These are my thoughts as of now:

Triggering conditions: a draft artifact is changed a draft artifact is published a published artifact gets a new draft version an artifact which has a previous version embedded in another artifact gets updated

If a field is updated, => 1) Offer to update all elements that embed the field all templates that embed the field => 2) Offer to update all templates that embed the elements in step 1) => 3) Offer to update all instances that are derived from the templates updated in 1) or 2)

If an element is updated => 1) Offer to update all templates that embed the element => 2) Offer to update all instances that are derived from the templates updated in 1)

If a template is updated => 1) Offer to update all instances that are derived from the template

Only the artifacts that the user owns are shown for update

There should be an overview screen, either a multipage wizard like, or a long scrolling page. This summarizes the artifacts that were changed, the actions that are taken on them, and the other artifacts that are/could be affected by these changes

An artifact which is candidate for update, has three options: Update Keep as is Postpone decision

The overview page actualizes it's state based on decisions taken at every point

The instances can be rendered invalid after such a change. Or they can loose data.

The wizard should do it's best to align the instances with the new template. Should show the lost data Should show data that is mandatory in the new template, but it is empty in the instance

These instances should be collected into a work package, a collection that the user later can access. The user should be able to take the instances in such a collection, and fix them manually. This behaves like a conflicting merge in git. After solving one instance, the user marks it as solved. This removes it from the work package. The user can mark it as solved even if the instance does not validate.

muakdogan commented 8 months ago

In addition to Atti some thoughts of mine:

What if a template is shared with others and they have metadata instances created from that template? If shared people and original template owner diverge in the update decision?
Is deletion of the field or element should be considered as an update and the template update be triggered?
In cases of having too many instances and/or templates that will be affected by the update, we should provide a "bulk updating" option like "apply to all".
It maybe good to categorize the initial update like destructive (will end up data lose -> deleting a field, max characters 60 to 4, etc.0 ) / nondestructive OR relaxing/tightening and provide downstream options accordingly.

egyedia commented 8 months ago

This is a plan to create the graph changes that are needed to implement the functionality:

Graph changes

A new relationship, INCLUDES is introduced that links the including artifact to the included artifact. An alternative name is possible for the relationship, we have these with similar meaning:

CONTAINS - folders, artifacts
MEMBEROF - user, group
CONTAINSCATEGORY - category, category
CONTAINSARTIFACT - category, artifact

Initial Neo4j update task.

Iterate over all elements and templates in Neo4j.
Retrieve the artifacts data from Mongo.
- In every element, iterate over the first level fields or elements. These are the ones that are included in the element. Make a list of the @ids.
- In every template, iterate over the first level fields or elements. These are the ones that are included in the template. Make a list of the @ids.

In every artifact create new relationships to the included resources


(t:Template)-[INCLUDES]->(ei:Element)
(t:Template)-[INCLUDES]->(fi:Field)

(e:Element)-[INCLUDES]->(ei:Element) (e:Element)-[INCLUDES]->(fi:Field)



## Keeping the graph up-to-date

Every time a template or element is saved, the artifacts included at the first level are summarized, and the outgoing arcs from the current artifact are updated. This means deleting or creating arcs.

## Incremental Neo4j update task

It is a possibility to run the `INCLUDES` arc consolidation script from time to time. This would be performing the same operations as the initial run, except it would also delete arcs if necessary. This is not a must, but it can fix missing or extra arcs if for some reason the graph's integrity deteriorates. The source of truth for the `INCLUDES` arc should be considered to lie in the Mongo content. This code being a variation of the initial run, it is not a big effort to create. However, the scheduling of this task should be discussed, we do not have a unified scheduler at the moment.

marcosmro commented 8 months ago

Consider using two different relationships INCLUDES_ELEMENT and INCLUDES_FIELD instead of a single INCLUDES relationship. A single relationship might be the right approach, since it's easier to implement and maintain, and handles easily the use case "Retrieve all nested artifacts." However, if there's potential for the meanings or attributes of INCLUDES_ELEMENT and INCLUDES_FIELD to diverge in the future (I don't think so), different relationships would be a better idea.

@matthewhorridge @martinjoconnor any thoughts?

johardi commented 8 months ago

I'd like to propose prioritizing the implementation of this comment that I made on the last meeting (first bullet point) before we proceed with deploying the updating components change feature. The rationale behind this suggestion is to ensure that we establish a proper metadata acquisition workflow from the beginning thus minimizing the potential disruptions on instances when template's components are updated.

What do you think?

egyedia commented 7 months ago

Proposed solution for the UI and REST endpoint.

First request right after the update of an artifact (field or element) was successful:

POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-preview

{
    "@id": "id of the field or element that was updated"
}

Response:

{
    "@id": "value",
    "elements": {
        "id1": {
            "@id": "id1",
            "@type": "https://schema.metadatacenter.org/core/TemplateElement",
            "schema:name": "Name 1",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "operation": "DO-NOT-UPDATE"
        },
        "id2": {
            "@id": "id2",
            "@type": "https://schema.metadatacenter.org/core/TemplateElement",
            "schema:name": "Name 2",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "operation": "DO-NOT-UPDATE"
        }
    },
    "transitive-elements": {
    }
    "templates": {
        "id3": {
            "@id": "id3",
            "@type": "https://schema.metadatacenter.org/core/TemplateElement",
            "schema:name": "Name 3",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "operation": "DO-NOT-UPDATE"
        }
    }
    "transitive-templates": {
    }
}

The user is presented with two grids of elements to be updated, and two grids of templates to be updated. If the artifact updated was a field: The element grid has rows in it, initially with the "DO-NOT-UPDATE" action selected. The transitive-element grid is empty, later it can have values if the action is changed in the first grid The template grid has rows in it, with the templates that directly include the field. The transitive-template grid first is empty, later it contains the templates that can be updated because they transitively include the field that was updated.

If the artifacts was an element: The element grid shows other elements that include this given element The transitive-element grid is empty, later it can have values if the action is changed in the first grid The template grid shows the templates that include this element The transitive-template grid first is empty, later it contains the templates that can be updated because they transitively include the element that was updated.

Once the user made changes in the tables, the view can be updated (either automatically, or with a "Refresh" button)

The request should be:

POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-preview

{
    "@id": "value",
    "elements": {
        "id1": {
            "@id": "id1",
            "operation": "DO-NOT-UPDATE"
        },
        "id2": {
            "@id": "id2",
            "operation": "UPDATE"
        }
    },
    "transitive-elements": {
    }
    "templates": {
        "id3": {
            "@id": "id3",
            "operation": "UPDATE"
        }
    }
    "transitive-templates": {
    }
}

This is practically the same data that was returned, with the operations changed. The whole data structure can be sent, but only the "@id" and the "operation" fields will be taken into account.

The result will be the same structure with a different number of entries in each four blocks, depending on the choices made in the previous step.

Once the user is content with the setup, they can click on the "Apply Changes" / "Propagate Changes" button.

A call should be made to:

POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-update

with the same datasatructure as before, only the "@id" and "operation" fields matter.

At this stage we only handle the propagation of changes to artifacts that the current user (the one making the original modification) has write access. At this stage if the changed artifact is included in an element or template that the user has no write access to, the change will never be applied to the "foreign" artifact.

In a later stage we will update the model so that other users can update their artifacts as well as a result of a change.

egyedia commented 7 months ago

Consider using two different relationships INCLUDES_ELEMENT and INCLUDES_FIELD instead of a single INCLUDES relationship. A single relationship might be the right approach, since it's easier to implement and maintain, and handles easily the use case "Retrieve all nested artifacts." However, if there's potential for the meanings or attributes of INCLUDES_ELEMENT and INCLUDES_FIELD to diverge in the future (I don't think so), different relationships would be a better idea.

@matthewhorridge @martinjoconnor any thoughts?

There are two ways:

[includer:Element]-[INCLUDES]->[included:Field]

[includer:Element]-[INCLUDES_FIELD]->[included:Field]

The second case is somewhat redundant, because from the two endpoints of the arc the arc type can be deduced. If we use two types of arcs (INCLUDES_FIELD, INCLUDES_ELEMENT) we need to write queries which allow both types of relationships if we want to query for a full inclusion graph: (s)-[INCLUDES_FIELD|INCLUDES_ELEMENT]->(t) At this point I am not sure which way is better. For simplicity I will go with the single relationship type for now, but this seems like a two-way door (in Amazon terms) so we can change it later if if turns out that the other solution is better.

egyedia commented 7 months ago

Update

After discussions with @muakdogan we agreed that we will show only two grids, one with the affected elements and one with the affected templates. As the elements can be recursively embedded, the elements grid actually will be a tree, which can grow continuously based on the user's choices.
The data flow will be similar, but we will include the bibo:status into the summary as well. In the template grid the number of instances for each template to be updated will be shown. This will be retrieved and cached by the front-end.
The grid representation in the returned structure will not be a map. Either a plain array, or a recursive structure, @muakdogan will decide which is more suitable based on the UI components available
Please note the new depth attribute in the array elements, showing the distance of the affected artifact from the originator.

First call

POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-preview

{
    "@id": "value"
}

First response

{
    "@id": "value",
    "elements": [
        {
            "@id": "id1",
            "@type": "https://schema.metadatacenter.org/core/TemplateElement",
            "schema:name": "Name 1",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "bibo:status": "bibo:draft",
            "depth": 1,
            "operation": "DO-NOT-UPDATE"
        },
        {
            "@id": "id2",
            "@type": "https://schema.metadatacenter.org/core/TemplateElement",
            "schema:name": "Name 2",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "bibo:status": "bibo:draft",
            "depth": 1,
            "operation": "DO-NOT-UPDATE"
        }
    ],
    "templates": [
        {
            "@id": "id3",
            "@type": "https://schema.metadatacenter.org/core/Template",
            "schema:name": "Name 3",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "bibo:status": "bibo:draft",
            "depth": 1,
            "operation": "DO-NOT-UPDATE"
        }
    ]
}

Second request

If the user chooses to update id1

POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-preview

{
    "@id": "value",
    "elements": [
        {
            "@id": "id1",
            "operation": "UPDATE"
        },
        {
            "@id": "id2",
            "operation": "DO-NOT-UPDATE"
        }
    ],
    "templates": [
        {
            "@id": "id3",
            "operation": "DO-NOT-UPDATE"
        }
    ]
}

Second response

Please note the element id-1-1 brought in by changing the operation on id1

{
    "@id": "value",
    "elements": [
        {
            "@id": "id1",
            "@type": "https://schema.metadatacenter.org/core/TemplateElement",
            "schema:name": "Name 1",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "bibo:status": "bibo:draft",
            "depth": 1,
            "operation": "UPDATE"
        },
        {
            "@id": "id1-1",
            "@type": "https://schema.metadatacenter.org/core/TemplateElement",
            "schema:name": "Name 1-1",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "bibo:status": "bibo:draft",
            "depth": 2,
            "operation": "DO-NOT-UPDATE"
        },
        {
            "@id": "id2",
            "@type": "https://schema.metadatacenter.org/core/TemplateElement",
            "schema:name": "Name 2",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "bibo:status": "bibo:draft",
            "depth": 1,
            "operation": "DO-NOT-UPDATE"
        }
    },
    "templates": [
        {
            "@id": "id3",
            "@type": "https://schema.metadatacenter.org/core/Template",
            "schema:name": "Name 3",
            "schema:description": "Description",
            "pav:createdOn": "2023-11-14T14:01:13-08:00",
            "pav:lastUpdatedOn": "2023-11-14T14:01:13-08:00",
            "bibo:status": "bibo:draft",
            "depth": 1,
            "operation": "DO-NOT-UPDATE"
        }
    ]
}

Committing the changes

The above flow only previews what would happen if the user was to commit the changes. No real changes are applied to the graph or artifacts in the inclusions-subgraph-preview mode. In order to actually apply the changes, a last call is needed:

POST
https://resource.metadatacenter.orgx/command/inclusions-subgraph-update

The body should be the same structure as in the second request.

This flow is stateless, the back-end does not keep track of any state, it always computes the affected artifacts based on the request body.

At this implementation phase no change detection is in place. After applying changes the whole process could be restarted from the top, and it would result in a similar flow.

muakdogan commented 7 months ago

List of things we agreed on after out meeting with @egyedia:

Following the update of an artifact, a modal will be provided to user as details described in @egyedia's proposal.
For now only the artifacts that the initiating user (user that makes the original update) has write wrights will be taken into consideration.
There'll be two options: "Do Not Update" and "Update". Previously discussed "Snooze" option will not be implemented.
When the modal is closed there won't be going back: "There won't be a functionality to pull up the affected artifacts list later and change "Do Not Update" or "Update" decisions made earlier.
Modal will be embedded in "CEDAR" as web component, developed in new Angular and published in NPM.

muakdogan commented 7 months ago

UI will provide this data/functionality but not necessarily with exact look & feel. Screenshot 2023-11-22 at 9 18 21 AM

marcosmro commented 7 months ago

Thanks @muakdogan, please consider the following ideas:

Add links to open and inspect the different artifacts in a different tab.
Show relevant information for each artifact, such as it's owner or the path to the folder where it is located.
Show all the templates, not only the draft versions, with an option to create a new draft version of the template that includes the updated nested artifact.

muakdogan commented 6 months ago

Consuming Custom Elements (Web components with AngularJS)

Our version of AngularJS 1.5.8
AngularJS supports consuming of custom elements with v1.7.3
- Migration from 1.5 to 1.6
- Then from 1.6 to 1.7 at least
- If this path is chosen proposing migration to 1.8 too for LTS (see below)

NPM vs Bower
- Bower is outdated and not supported anymore. It will be good to swap Bower with NPM for future development.

marcosmro commented 4 months ago

@egyedia Could you please update the end date for this task with a reasonable new estimate?

egyedia commented 4 months ago

We are currently working on this task. To better understand our progress, I am outlining the backend subtasks here:

Preview the changes [DONE]: Starting from the recently updated artifact, and based on the user's choices (and access rights), a tree of changes to be applied is computed
Apply the changes: Based on the preview results, the actual changes are applied (resources are recursively updated in the CEDAR system) . This subtask is currently under development

This issue also needs a frontend component, which presents the REST API responses in a human-interpretable way. This part is developed by @muakdogan , I will let him report the progress of that subtask.

egyedia commented 4 months ago

The update REST endpoint is done, and ready for integration. An example to call it is:

POST https://resource.metadatacenter.orgx/command/inclusions-subgraph-update
{
    "elements": {
        "https://repo.metadatacenter.orgx/template-elements/166db1a0-2a5c-445a-8e42-804f32eac302": {
            "operation": "update",
            "elements": {
                "https://repo.metadatacenter.orgx/template-elements/92c1b814-09e9-4360-8ba3-819ffc7e2eb5": {
                    "operation": "update",
                    "elements": {
                        "https://repo.metadatacenter.orgx/template-elements/a239f0df-1f3a-478f-b157-1b51312e6280": {
                            "operation": "update",
                            "templates": {
                                "https://repo.metadatacenter.orgx/templates/3ec70c91-96d7-4173-8e30-433a197b1e23": {
                                    "operation": "update"
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "@id": "https://repo.metadatacenter.orgx/template-fields/1cda2e5b-4edf-4e23-8931-6d4521e523ef"
}

@muakdogan let me know if you run into any issues during the frontend integration!

marcosmro commented 1 month ago

We'll consider that this work is part of this task.

johardi commented 1 month ago

RE: Apply changes UI/UX

I suggest listing all the artifacts with the updated 'field1' and adding a checkmark next to each one. This will allow users to select which artifacts need the updated version (see image below).

bubble

If users select an element item where it is being used by other artifacts, the system automatically selects the associated elements and templates. Users cannot modify these auto-selected items such that the checkboxes are disabled (see image below).

Group

However, in my opinion, the vice-versa should be different: if users select the template item first, it is not necessary for the associated element to receive the update.

jkyu commented 1 month ago

With respect to selectively pushing a field update, I think this can cause some serious consistency issues down the line after just a few updates.

From the perspective where a field1 has a unique identifier (be it a URL or database entry or whatever), updating field1 in one location of a template and not in another location seems like abuse of field1. That seems like the user should have been using a separate field in the location where the update is not desired. This is a perspective that would favor pushing the change everywhere or referencing a singleton field1, but this may not be desirable either because the user might see changes to field1 in places that were not intended to be updated.

I thought that strict versioning might be able to mitigate this problem, so that you could have field1 v1 where it is not updated and field1 v2 where the update has been pushed, but this would break semantic versioning if the user then wants to make a change to field1 v1 after already having made field1 v2 previously. This led me to yet another idea of cloning field1 (so that it becomes a separate entity) on update if the user does not want the change propagated everywhere. This wouldn't solve the issue of a bifurcating field1, but it would make the update process transparent and avoid causing a potential consistency problem with field1. Propagating this process up through containing elements and the template could be computationally challenging if elements are also abused in the same way.

egyedia commented 3 weeks ago

We installed the feature branch onto our staging server to test it out. During these tests a frontend bug in the old template editor was found, which in some cases (changing the name or description of a field) results in the update feature to seem that it is not working. I started working on fixing this, the issue is reported here: https://github.com/metadatacenter/cedar-template-editor/issues/994