TAMULib / IRIIIFService

IIIF manifest generator for DSpace RDF and/or Fedora PCDM
MIT License
9 stars 3 forks source link

[SPIKE] irIIIFService: Fix bug with encoding encoded strings #150

Open devangm opened 1 day ago

devangm commented 1 day ago

irIIIFService encodes a URI string even if it's already encoded. This results in a 404 for manifest generation if a percent for an encoded character is in the value (it encodes the encoded char with another %25.

https://github.com/TAMULib/IRIIIFService/issues/149 https://github.com/TAMULib/IRIIIFService/issues/139

Acceptance Criteria

Manifests for files with spaces are generated as expected.

kaladay commented 1 day ago

There is explicit encoding happening here:

Introduced here:

And here:

To address some sort of dspace situation (based on the branch name containing "dspace" in it).

See also:

markpbaggett commented 23 hours ago

@kaladay and @qtamu Adding this here just for science. One thing that I thought about last week and wanted to try but couldn't as I don't have access to FCREPO on dev and pre was to see if Fedora can even have a URI that is not URL encoded. My interpretation of RDF Concepts is that this shouldn't be allowed, but who knows if Fedora and DSPACE follow the specification closely enough to say for sure.

To test the idea, I was thinking we should directly target the problematic URI in question, delete it, and then reinsert an unescaped URI. The request will likely fail altogether as the SPARQL update wouldn't even be valid with the URI unescaped, but it would be interesting to see if it does in fact pass.

How to Test on Dev with an Existing Resource

Create a file called update.ru with the following SPARQL Update as the body of the file:

PREFIX iana: <http://www.iana.org/assignments/relation/>

DELETE {
  <> iana:describedby <https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg/fcr:metadata> .
}
INSERT {
  <> iana:describedby <https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files/blumberg-holiday card_1.jpg/fcr:metadata> .
}
WHERE { }

Do a curl request via shell on local host like so:

curl -X PATCH \
     -H "Content-Type: application/sparql-update" \
     --data-binary "@update.ru" \
     "https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg/fcr:metadata"

You'll need to also pass username and auth and both can be found in environmental variables in Rancher dev.

What I think will happen when we run this?

  1. My prediction: a 412 Precondition failed response. I think this is most likely as the SPARQL isn't valid (URIs can't be unescaped).
  2. Possible: a 204 response where only the delete is successful. Again, I just don't see the insert working, but maybe something bad here allows the initial part of the request to go through. No problem though, as we can just fix with running an Insert request to add the URI back as it was originally in the DELETE (also this is dev).
  3. Unlikely: a 204 where the url gets deleted and reinserted but the object of the triple in question ends up being https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg/fcr:metadata. In other words, request successfully goes through, but Fedora escapes the URI even though we said for it not to.
  4. I'd be shocked: a 204 where the url gets deleted and the object of the triple becomes https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files/blumberg-holiday card_1.jpg/fcr:metadata. In this case, we've absolutely got to account for unescaped URIs and handle escaping in our various requests.