Open jkotan opened 7 months ago
Dear @jkotan I agree with you, and I have been thinking about this for a while. I would like to be able to add, update or remove a specific scientific metadata key, without the need to rewrite the whole structure.
I will mention it to the next developers meeting and see.
Would you be so kind to come with some examples and maybe with a endpoints schema, so I have something concrete to work with?
I found a good stack overflow answer regarding this. It basically suggests setting the Content-Type
of the PATCH payload to one of the standards developed for this.
Content-Type: application/merge-patch+json
Example:
PATCH /target HTTP/1.1
Host: example.org
Content-Type: application/merge-patch+json
{
"a":"z",
"c": {
"f": null
}
}
Content-Type: application/json-patch+json
RFC 6902, https://jsonpatch.com/
Example:
PATCH /my/data HTTP/1.1
Host: example.org
Content-Length: 326
Content-Type: application/json-patch+json
If-Match: "abc123"
[
{ "op": "test", "path": "/a/b/c", "value": "foo" },
{ "op": "remove", "path": "/a/b/c" },
{ "op": "add", "path": "/a/b/c", "value": [ "foo", "bar" ] },
{ "op": "replace", "path": "/a/b/c", "value": 42 },
{ "op": "move", "from": "/a/b/c", "path": "/a/b/d" },
{ "op": "copy", "from": "/a/b/d", "path": "/a/b/e" }
]
I like the simple syntax of merge-patch. However, there are some operations that can't be represented (eg null
values). Json-patch is more explicit but also harder to use.
I found the fast-json-patch and json-merge-patch libraries. Both require a few lines of code to be added to each PATCH endpoint to support the new Content-Types.
@sbliven are you able to provide some examples to better understand how to integrate it in SciCat? Would you develop an dedicated endpoint for metadata in order to implement functionality isolation and reduce confusion for users?
@nitrosx From the client side, adapting @jkotan's initial example to use merge-patch is trivial, and just requires setting the Content-Type:
headers2['Content-Type'] = 'application/merge-patch+json'
requests.patch(f"{baseurl}/Datasets/{pid.replace("/" ,"%2F")}",
params={"access_token":token},
headers=headers2,
data=jsc)
On the server side we would
jsonmergepatch.apply(dataset, request.json)
Regarding the dedicated endpoint, the current behavior with application/json
data is undefined and should arguably be deprecated. What if we just stop supporting PATCH requests with json type?
Decisions from today's meeting:
application/merge-patch+json
for all PATCH requestsapplication/json
merge patch typescript libraries:
from meeting notes (https://docs.google.com/document/d/1f4gfmgPOycWG5b2zSSklc0uQ1pKL-pn70Hm3HIftx9k/edit) of 30.04.2024:
The community has decided to implement merge-patch
To use it, the client needs to do a PATCH request on the endpoint with content-type set to application/merge-patch+json.
A request to Datasets endpoint will allow to modify all fields of datasets, except the protected ones.
It was proposed to create endpoint Datasets/scientificmetadata to allow modification only of scientific metadata.
Summary
I would be nice to support atomic partial update of scientificMetadata. It would allowes to update sientificMetadata simultaneously by a few ingestor instances i.e. a scan ingestor could update
scientificMetadata.instrument
while almost at the same time an user UI ingestor could updatescientificMetadata.userData
Current Behaviour
Currenly, you can patch the scinetificMetadata by fetching the current scienficMetadata an pushing back the update field which is not atomic operation (one transaction on DB) e.g.
On the other hand if one try to update only the
instrument current
in one call it of course removes all other scientificMetadata content, e.g.Expected Behaviour
Therefore, I think it would be good to add an action to update partially scientificMetadata in one call e.g. by
Currently, the last call returns 400:
property scientificMetadata.instrument.current should not exist