TAMULib / IRIIIFService

IIIF manifest generator for DSpace RDF and/or Fedora PCDM
MIT License
8 stars 3 forks source link

irIIIFService will attempt to reencode a Fedora URI that is already encdoded #149

Open markpbaggett opened 1 month ago

markpbaggett commented 1 month ago

Describe the bug In some cases, irIIIFService will try to request a Fedora resource by encoding parts of a URI from Fedora that has already been encoded. This causes irIIIFService to fail to request the RDF resource with a 404 and results in the Manifest not generating.

To Reproduce Steps to reproduce the behavior:

  1. To see this in action, checkout the third Manifest in this Collection: https://api-pre.library.tamu.edu/iiif-service/fedora/collection/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924
  2. Click on the Manifest https://api-pre.library.tamu.edu/iiif-service/fedora/presentation/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17
  3. We get an innocent enough error message: RDF not found! https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg/fcr:metadata, but that URL resolves to a 200. Hmmmmm. 🤔
  4. If we dig deeper and look at the logs, we can see what's actually happening: o.s.web.client.RestTemplate : HTTP GET [https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/pages/page_0/files/blumberg-holiday%2520card_1.jpg/fcr:metadata](https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/pages/page_0/files/blumberg-holiday%2520card_1.jpg/fcr:metadata) 2024-09-27 14:07:46.864 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate : Accept=[text/plain, application/json, application/*+json, */*] 2024-09-27 14:07:46.898 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate : Response 404 NOT_FOUND 2024-09-27 14:07:46.899 ERROR 1 --- [nio-8080-exec-4]
  5. Notice that the Fedora Resource being requested is blumberg-holiday%2520card_1.jpg. In Fedora, it is blumberg-holiday%20card_1.jpg, and on disk it was blumberg-holiday card_1.jpg. In other words, we have a file with spaces, Fedora (or MAGPIE) encodes the space as %20, and then irIIIFService thinks the % needs be encoded as %25, and we end up with %2520.

Expected behavior irIIIFService attempts to determine if a resource uri is already encoded instead of reencoding.

Screenshots

Screenshot 2024-09-27 at 9 50 55 AM

Full trace back:

2024-09-27 14:07:45.374 DEBUG 1 --- [nio-8080-exec-4] o.s.security.web.FilterChainProxy        : Securing GET /fedora/presentation/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17
2024-09-27T14:07:45.374801783Z 2024-09-27 14:07:45.374 DEBUG 1 --- [nio-8080-exec-4] s.s.w.c.SecurityContextPersistenceFilter : Set SecurityContextHolder to empty SecurityContext
2024-09-27 14:07:45.374 DEBUG 1 --- [nio-8080-exec-4] o.s.s.w.a.AnonymousAuthenticationFilter  : Set SecurityContextHolder to anonymous SecurityContext
2024-09-27 14:07:45.375 DEBUG 1 --- [nio-8080-exec-4] o.s.s.w.a.i.FilterSecurityInterceptor    : Authorized filter invocation [GET /fedora/presentation/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17] with attributes [permitAll]
2024-09-27T14:07:45.375170295Z 2024-09-27 14:07:45.375 DEBUG 1 --- [nio-8080-exec-4] o.s.security.web.FilterChainProxy        : Secured GET /fedora/presentation/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17
2024-09-27 14:07:45.375 DEBUG 1 --- [nio-8080-exec-4] o.s.web.servlet.DispatcherServlet        : GET "/iiif-service/fedora/presentation/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17", parameters={}
2024-09-27 14:07:45.375 DEBUG 1 --- [nio-8080-exec-4] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped to edu.tamu.iiif.controller.fedora.pcdm.FedoraPcdmPresentationManifestController#manifest(HttpServletResponse, ManifestRequest)
2024-09-27 14:07:45.375 DEBUG 1 --- [nio-8080-exec-4] o.s.d.redis.core.RedisConnectionUtils    : Fetching Redis Connection from RedisConnectionFactory
2024-09-27T14:07:45.376761941Z 2024-09-27 14:07:45.376 DEBUG 1 --- [nio-8080-exec-4] o.s.d.redis.core.RedisConnectionUtils    : Closing Redis Connection.
2024-09-27 14:07:45.376  INFO 1 --- [nio-8080-exec-4] e.t.i.service.AbstractManifestService    : Generating new manifest.
2024-09-27 14:07:45.376 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : HTTP GET https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17
2024-09-27T14:07:45.377025724Z 2024-09-27 14:07:45.376 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Accept=[text/plain, application/json, application/*+json, */*]
2024-09-27 14:07:45.767 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Response 200 OK
2024-09-27 14:07:45.768 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Reading to [java.lang.String] as "text/plain;charset=utf-8"
2024-09-27 14:07:45.769 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : HTTP GET https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/orderProxies/page_0_proxy/fcr:metadata
2024-09-27T14:07:45.769921405Z 2024-09-27 14:07:45.769 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Accept=[text/plain, application/json, application/*+json, */*]
2024-09-27 14:07:46.278 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Response 200 OK
2024-09-27T14:07:46.279633626Z 2024-09-27 14:07:46.278 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Reading to [java.lang.String] as "text/plain;charset=utf-8"
2024-09-27 14:07:46.280 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : HTTP GET https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/pages/page_0/fcr:metadata
2024-09-27 14:07:46.280 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Accept=[text/plain, application/json, application/*+json, */*]
2024-09-27 14:07:46.549 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Response 200 OK
2024-09-27T14:07:46.550208880Z 2024-09-27 14:07:46.550 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Reading to [java.lang.String] as "text/plain;charset=utf-8"
2024-09-27 14:07:46.550 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : HTTP GET https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/pages/page_0/files/fcr:metadata
2024-09-27 14:07:46.550 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Accept=[text/plain, application/json, application/*+json, */*]
2024-09-27 14:07:46.863 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Response 200 OK
2024-09-27T14:07:46.863518810Z 2024-09-27 14:07:46.863 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Reading to [java.lang.String] as "text/plain;charset=utf-8"
2024-09-27 14:07:46.863 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : HTTP GET https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/pages/page_0/files/blumberg-holiday%2520card_1.jpg/fcr:metadata
2024-09-27 14:07:46.864 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Accept=[text/plain, application/json, application/*+json, */*]
2024-09-27 14:07:46.898 DEBUG 1 --- [nio-8080-exec-4] o.s.web.client.RestTemplate              : Response 404 NOT_FOUND
2024-09-27 14:07:46.899 ERROR 1 --- [nio-8080-exec-4] e.t.i.service.AbstractManifestService    : Failed to get RDF for https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg/fcr:metadata: 404 : "<!doctype html><html lang="en"><head><title>HTTP Status 404 – Not Found</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 404 – Not Found</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> Not Found</p><p><b>Description</b> The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.</p><hr class="line" /><h3>Apache Tomcat/8.5.100</h3></body></html>"
2024-09-27 14:07:46.899 DEBUG 1 --- [nio-8080-exec-4] .m.m.a.ExceptionHandlerExceptionResolver : Using @ExceptionHandler edu.tamu.iiif.controller.advice.GlobalExceptionHandler#handleNotFoundException(NotFoundException)
2024-09-27 14:07:46.899 DEBUG 1 --- [nio-8080-exec-4] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Using 'text/html', given [text/html, application/xhtml+xml, image/avif, image/webp, image/apng, application/xml;q=0.9, application/signed-exchange;v=b3;q=0.7, */*;q=0.8] and supported [text/plain, */*, text/plain, */*, application/json, application/*+json, application/json, application/*+json]
2024-09-27 14:07:46.899 DEBUG 1 --- [nio-8080-exec-4] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Writing ["RDF not found! https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-0680 (truncated)..."]
2024-09-27 14:07:46.900 DEBUG 1 --- [nio-8080-exec-4] .m.m.a.ExceptionHandlerExceptionResolver : Resolved [edu.tamu.iiif.exception.NotFoundException: RDF not found! https://api-pre.library.tamu.edu/fcrepo/rest/bb/97/f2/3e/bb97f23e-803a-4bd6-8406-06802623554c/basbanes-exhibit-texts-20240924_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg/fcr:metadata]
2024-09-27 14:07:46.900 DEBUG 1 --- [nio-8080-exec-4] o.s.web.servlet.DispatcherServlet        : Completed 404 NOT_FOUND
2024-09-27T14:07:46.900403106Z 2024-09-27 14:07:46.900 DEBUG 1 --- [nio-8080-exec-4] s.s.w.c.SecurityContextPersistenceFilter : Cleared SecurityContextHolder to complete request

Additional context

A couple of things worth thinking about:

  1. Is MAGPIE (how this was originally ingested) doing the initial encoding or Fedora? If it's the former, could we have unencoded resources in Fedora that would be affected by this change (dropping encoding all together)?
  2. Wouldn't all this be avoided if we had an expectation that filenames followed a certain pattern prior to ingest and that spaces and other certain characters weren't allowed? I think I can advocate this for the future.
markpbaggett commented 1 month ago

Just a note that it looks like the RDF spec requires that URIs are encoded so this is almost certainly something Fedora is doing when a new resource is minted. For that reason, I don't see why irIIIFService would even need to encode these bits to begin with.

markpbaggett commented 2 weeks ago

Just adding to this ticket. It looks like this is also a problem on DSpace now. It wasn't originally (the affected collection worked), but it is now. Oddly, the error is reported as a 500:

2024-10-28 20:43:12.829 ERROR 1 --- [-8080-exec-1138] e.t.i.service.AbstractManifestService    : Failed to get RDF for https://oaktrust.library.tamu.edu/server/rdf/handle/1969.1/169475/10/0003%20Front%20Map%20with%20Booklet.jpg: 500 500: "{"timestamp":"2024-10-28T20:43:12.826+00:00","status":500,"error":"Internal Server Error","message":"The request was rejected because the URL contained a potentially malicious String \"%25\"","path":"/server/rdf/handle/1969.1/169475/10/0003%2520Front%2520Map%2520with%2520Booklet.jpg"}"
2024-10-28T20:43:12.832248496Z 2024-10-28 20:43:12.831 DEBUG 1 --- [-8080-exec-1138] e.t.i.service.AbstractManifestService    : Error while requesting RDF for https://oaktrust.library.tamu.edu/server/rdf/handle/1969.1/169475/10/0003%20Front%20Map%20with%20Booklet.jpg: 500 500: "{"timestamp":"2024-10-28T20:43:12.826+00:00","status":500,"error":"Internal Server Error","message":"The request was rejected because the URL contained a potentially malicious String \"%25\"","path":"/server/rdf/handle/1969.1/169475/10/0003%2520Front%2520Map%2520with%2520Booklet.jpg"}"
2024-10-28T20:43:12.832267329Z 
2024-10-28T20:43:12.832271910Z org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 500: "{"timestamp":"2024-10-28T20:43:12.826+00:00","status":500,"error":"Internal Server Error","message":"The request was rejected because the URL contained a potentially malicious String \"%25\"","path":"/server/rdf/handle/1969.1/169475/10/0003%2520Front%2520Map%2520with%2520Booklet.jpg"}"
2024-10-28T20:43:12.832346592Z  at org.springframework.web.client.HttpServerErrorException.create(HttpServerErrorException.java:100) ~[spring-web-5.3.24.jar!/:5.3.24]
2024-10-28T20:43:12.832380893Z  at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:170) ~[spring-web
-5.3.24.jar!/:5.3.24]
Screenshot 2024-10-28 at 3 50 49 PM