dlcs / iiif-presentation

Allows for the creation and management of IIIF manifests
MIT License
0 stars 0 forks source link

Fix incorrect `parent` value format handling #108

Closed griffri closed 1 week ago

griffri commented 1 week ago

Currently you can set the parent value of a Presentation resource to an ID and it will work, but a full path won't be handled correctly:

The Netlify documentation shows that a parent can be a full URL to a collection. Example from Create a IIIF Collection within a Storage Collection:

{
// body of IIIF Collection, without the id property, and with the mandatory parent and slug:
//..
"parent": "https://iiif.dlc.services/99/collections/my-flat-identifier",
"slug": "b29000798", 
//..
}

However, this currently doesn't work - the API stores the entire URL as parent. It won't be visible as a child within the parent, and the parent value will appear like this: "parent": "https://iiif.dlc.services/99/collections/https://iiif.dlc.services/99/collections/my-flat-identifier"

This issue also occurs when the presentation root path (e.g https://presentation-api.dlcs.digirati.io/1/collections/root) is passed, but only on PUT - this seems to be working correctly for POST.


Hierarchical paths can also be passed through, which shouldn't be allowed. (reported by @JackLewis-digirati)

if a parent id is set with a hierarchical path i.e.: "parent": "<base url>/1/someSlug"
then if there is another collection with the id of someSlug, then this scenario would cause this collection to be set as the parent, rather than the collection with a slug of someSlug
NOTE: this is bad data being handed to the API, but also feels like something somebody new to understanding how the presentation API works would do accidentally


Acceptance Criteria

p-kaczynski commented 1 week ago

In UpsertCollection and in CreateCollection we have:

 var parentCollection = await dbContext.RetrieveCollectionAsync(request.CustomerId,
                request.Collection.Parent.GetLastPathElement(), true, cancellationToken);

which seems like it would turn https://iiif.dlc.services/99/collections/my-flat-identifier to my-flat-identifier, and indeed it does.

However, in both cases when it comes to setting the Parent property we just rewrite the request.Collection.Parent, ignoring the fact we have already resolved the parent collection (to verify it exists etc.). It seems we can just replace those usages with the id retrieved from the actual entity.

Tried that, wrote some tests where we set the parent (in either create or update requests) to some full URI, and it seems to correctly cut it down, saving only that last bit.

Which of course explains the last part of this ticket where we have to deal with someone setting a parent to a full URL, but because they are providing the hierarchical link, not flat one, the last URL segment is not unique id but it is a slug instead.

We can't instantly figure out which scenario are we dealing with, at least without baking-in some specific paths/routes. My current idea, which I'll check in a second, is to split the flow:

if is_full_url(parent)
  then
    generate a flat id full URL for the resolved parent (as we do elsewhere)
    return error if it does not match parent from request
endif

which reuses whatever code generates the full URLs instead of duplicating it elsewhere, creating maintenance nightmare.

p-kaczynski commented 1 week ago
 // If full URI was used, verify it indeed is pointing to the resolved parent collection
            if (Uri.IsWellFormedUriString(request.Collection.Parent, UriKind.Absolute)
                && !parentCollection.GenerateFlatCollectionId(request.UrlRoots).Equals(request.Collection.Parent))
                return ErrorHelper.NullParentResponse<PresentationCollection>();

this seems to work nicely :D