SciCatProject / scicat-backend-next

SciCat Data Catalogue Backend
https://scicatproject.github.io/documentation/
BSD 3-Clause "New" or "Revised" License
18 stars 21 forks source link

add support for atomic patching of scientificMetadata sub-fields #954

Open jkotan opened 7 months ago

jkotan commented 7 months ago

Summary

I would be nice to support atomic partial update of scientificMetadata. It would allowes to update sientificMetadata simultaneously by a few ingestor instances i.e. a scan ingestor could update scientificMetadata.instrument while almost at the same time an user UI ingestor could update scientificMetadata.userData

Current Behaviour

Currenly, you can patch the scinetificMetadata by fetching the current scienficMetadata an pushing back the update field which is not atomic operation (one transaction on DB) e.g.

import requests
import json

baseurl = "http://localhost:3000/api/v3"
headers = {'Content-Type': 'application/json', 'Accept': 'application/json'}
pwd="aman"
response = requests.post(baseurl+ "/Users/login", headers=headers, json={"username":"ingestor", "password":pwd})
token = response.json()['id']

headers2 = dict(headers)
headers2["Authorization"] = "Bearer {}".format(token)
pid="testjk01/mysute_00019"
w = requests.get("{burl}/{url}/{pid}".format(burl=baseurl, url="Datasets",pid=pid.replace("/" ,"%2F")),params={"access_token":token},headers=headers2)
sc = w.json()["scientificMetadata"]

sc["instrument"]["current"] = {"value":99.2, "unit":"mA"}
jsc = json.dumps(sc)

requests.patch("{burl}/{url}/{pid}".format(burl=baseurl, url="Datasets",pid=pid.replace("/" ,"%2F")),params={"access_token":token},headers=headers2,data=jsc)

On the other hand if one try to update only the instrument current in one call it of course removes all other scientificMetadata content, e.g.


import requests
import json

baseurl = "http://localhost:3000/api/v3"
headers = {'Content-Type': 'application/json', 'Accept': 'application/json'}
pwd="aman"
response = requests.post(baseurl+ "/Users/login", headers=headers, json={"username":"ingestor", "password":pwd})
token = response.json()['id']

headers2 = dict(headers)
headers2["Authorization"] = "Bearer {}".format(token)
pid="testjk01/mysute_00019"

jsc = json.dumps({"scientificMetadata":{"instrument":{"current":{"value":99.1, "unit":"mA"}}}})

requests.patch("{burl}/{url}/{pid}".format(burl=baseurl, url="Datasets",pid=pid.replace("/" ,"%2F")),params={"access_token":token},headers=headers2,data=jsc)

Expected Behaviour

Therefore, I think it would be good to add an action to update partially scientificMetadata in one call e.g. by


import requests
import json

baseurl = "http://localhost:3000/api/v3"
headers = {'Content-Type': 'application/json', 'Accept': 'application/json'}
pwd="aman"
response = requests.post(baseurl+ "/Users/login", headers=headers, json={"username":"ingestor", "password":pwd})
token = response.json()['id']

headers2 = dict(headers)
headers2["Authorization"] = "Bearer {}".format(token)
pid="testjk01/mysute_00019"

jsc = json.dumps({"scientificMetadata.instrument.current":{"value":99.1, "unit":"mA"})

requests.patch("{burl}/{url}/{pid}".format(burl=baseurl, url="Datasets",pid=pid.replace("/" ,"%2F")),params={"access_token":token},headers=headers2,data=jsc)

Currently, the last call returns 400: property scientificMetadata.instrument.current should not exist

nitrosx commented 7 months ago

Dear @jkotan I agree with you, and I have been thinking about this for a while. I would like to be able to add, update or remove a specific scientific metadata key, without the need to rewrite the whole structure.

I will mention it to the next developers meeting and see.

Would you be so kind to come with some examples and maybe with a endpoints schema, so I have something concrete to work with?

sbliven commented 7 months ago

I found a good stack overflow answer regarding this. It basically suggests setting the Content-Type of the PATCH payload to one of the standards developed for this.

Merge-patch

Content-Type: application/merge-patch+json

RFC 7396

Example:

PATCH /target HTTP/1.1
Host: example.org
Content-Type: application/merge-patch+json

{
 "a":"z",
 "c": {
   "f": null
 }
}

Json-patch

Content-Type: application/json-patch+json

RFC 6902, https://jsonpatch.com/

Example:

PATCH /my/data HTTP/1.1
Host: example.org
Content-Length: 326
Content-Type: application/json-patch+json
If-Match: "abc123"

[
 { "op": "test", "path": "/a/b/c", "value": "foo" },
 { "op": "remove", "path": "/a/b/c" },
 { "op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ] },
 { "op": "replace", "path": "/a/b/c", "value": 42 },
 { "op": "move", "from": "/a/b/c", "path": "/a/b/d" },
 { "op": "copy", "from": "/a/b/d", "path": "/a/b/e" }
]

Comparison

I like the simple syntax of merge-patch. However, there are some operations that can't be represented (eg null values). Json-patch is more explicit but also harder to use.

I found the fast-json-patch and json-merge-patch libraries. Both require a few lines of code to be added to each PATCH endpoint to support the new Content-Types.

nitrosx commented 7 months ago

@sbliven are you able to provide some examples to better understand how to integrate it in SciCat? Would you develop an dedicated endpoint for metadata in order to implement functionality isolation and reduce confusion for users?

sbliven commented 7 months ago

@nitrosx From the client side, adapting @jkotan's initial example to use merge-patch is trivial, and just requires setting the Content-Type:

headers2['Content-Type'] = 'application/merge-patch+json'

requests.patch(f"{baseurl}/Datasets/{pid.replace("/" ,"%2F")}",
               params={"access_token":token},
               headers=headers2,
               data=jsc)

On the server side we would

Regarding the dedicated endpoint, the current behavior with application/json data is undefined and should arguably be deprecated. What if we just stop supporting PATCH requests with json type?

sbliven commented 2 months ago

Decisions from today's meeting:

nitrosx commented 2 months ago

merge patch typescript libraries:

rkweehinzmann commented 2 months ago

from meeting notes (https://docs.google.com/document/d/1f4gfmgPOycWG5b2zSSklc0uQ1pKL-pn70Hm3HIftx9k/edit) of 30.04.2024: