Closed rosiel closed 5 years ago
Been digging through this, and there's a lot going on here
Binary content in fedora:// vs public:// have their RDF indexed differently, and as a consequence, are indexed in Gemini differently. What you're seeing is expected behaviour w/r/t your Gemini index, but that doesn't make it good behaviour. What really needs to happen is we start using the Fedora spec's external content features. Then things in and out of Fedora will get treated more consistently.
Thtat'd at least help you confirm things without having to manually traverse Fedora, parsing RDF as you go. I'm updating to newest on a bunch of things and am testing the install now with https://github.com/Islandora-CLAW/Alpaca/tree/fix-triplestore-indexing.
We've got some tickets out there to bubble the mapping in Gemini up into the html as well as link headers. Making all of that transparent (for folks with the right permissions) will dodge the rigmarole of looking things up in Gemini.
So... yeah... apologies for the mine field :worried: I'm patching up the triplestore indexing and working my way back to the actual issue. We'll get it sorted out.
Yeah I was going to say that we got the fcrepo-camel and fcrepo-camel-toolbox working against Fedora 5, so we should be able to resolve the triplestore stuff. Could possibly also work towards getting the API-X stuff working too, but I'm not sure what is required there.
So, @dannylamb , what would it take to get the Media of a fedora:// file indexed in Fedora? Our metadata folks really want to start attaching some PREMIS 3 metadata (e.g. preservationLevel) to files, and the corresponding Media entity seems the logical place to do is since file entities aren't fieldable.
I'm willing to try my hand at it if I can get pointed in the right direction.
@whikloj I can get fcrepo-indexing-triplestore
up and running, but then that messes with islandora-indexing-triplestore
:( fcrepo-api-x
has a dependency that could be updated, too, but I'm just commenting it out for now until I can set things straight in Alpaca and islandora-indexing-triplestore
@seth-shaw-unlv I would have to pull the thread further, but https://github.com/Islandora-CLAW/Crayfish/blob/master/Milliner/src/Controller/MillinerController.php#L99 is the place to start. That's the route that's for media for files that live in Fedora. Media whose files don't live in Fedora actually use the saveNode
function above it (as do taxonomy terms, FYI).
What are "media whose files don't live in Fedora"?
I have a derivative in Drupal ("2 - Service File") that was automatically generated when I uploaded an original file. It's clearly saved "in Drupal" as in, the URL of the file is at /sites/default/files/2019-04 and if I go to that directory on the filesystem, the file is there.
But the field, according to drupal, is a "file_media_image" which, is configured to use Fedora (flysystem) storage. Indeed - if I don't like the thumbnail and decide to replace it, (remove the existing file, add a new file) then the file I add gets stored in Fedora instead of Drupal.
So because the Service File was added through the backend, you can trick the field storage to make it actually save in Drupal? Weird flex.
Could we instead have "Image (Fedora)" and "Image (Drupal)" as two separate Media types, so that a user can predict/expect/understand the file storage behavour?
@rosiel I'm threading the needle on Drupal behaviour there. Configuring the field controls which filesystem is used when uploading through the UI. But derivatives are generated using the REST api, which accepts a drupal uri in the Content-Location header. You can, however, configure where derivatives are deposited by configuring the action for that derivative.
So by splitting out Media types you'll get the front end. And then you can configure the backend for both media types with Context/Actions to lock down where the derivatives are placed. If that seems like better default behaviour to folks, we can do that. Would just mean a PR to islandora_demo
with the exported config.
As an alternative, could we alter the file upload form to include a file-descriptor dropdown? Then a user could select if a file is destined for Drupal or Fedora at that point. Making a duplicate set of Media types makes me cringe and I would like to avoid it if possible.
@seth-shaw-unlv A dropdown in the file widget would be a welcome improvement regardless.
Ok, so I've managed to fix the triplestore for Fedora indexing, and looking at this further, am receiving 403's from Milliner when trying to hit localhost:8000/milliner/media/{field_name}
.
[2019-04-16 07:58:47] app.ERROR: {"Exception":"[object] (RuntimeException(code: 403): Client error: `PUT http://localhost:8080/fcrepo/rest/2019-04/Learning%20SPARQL.pdf/fcr:metadata` resulted in a `403 Forbidden` response: http://localhost:8080/fcrepo/rest/2019-04/Learning SPARQL.pdf is not in the topic of this RDF, which is http://localhost:8080/fcrepo/rest/2019-04/Learning%20SPARQL.pdf/fcr:metadata at /var/www/html/Crayfish/Milliner/src/Service/MillinerService.php:485)"} []
Looking like maybe now the /fcr:metadata
elbow is the subject for RDF resource associated with a file. Maybe that changed in 5.0 or was wrong all along? Either way, this shouldn't be too terrible of a fix.
So I've gone ahead and taken the liberty of updating to use external content properly through the Fedora API. I'm pushing up some branches now, but will have to update the fcrepo role to deploy some config to get external content going. I'll let you know when it's ready, but nonetheless....
From Gemini we've got all the files indexed, regardless of what filesystem they use:
uuid | fedora_uri | drupal_uri |
---|---|---|
efd0e0f1-1ea2-4e07-b4ed-50a2b620ad50 | http://localhost:8080/fcrepo/rest/2019-04/Casionova1.jpg | http://localhost:8000/_flysystem/fedora/2019-04/Casionova1.jpg |
8bcd29f5-8b88-40f5-9b77-d742c53492bb | http://localhost:8080/fcrepo/rest/8b/cd/29/f5/8bcd29f5-8b88-40f5-9b77-d742c53492bb | http://localhost:8000/sites/default/files/2019-04/5-Thumbnail%20Image.jpg |
81ab2725-55d9-4f8a-82a8-8e2af655cdd5 | http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5 | http://localhost:8000/sites/default/files/2019-04/5-Service%20File.jpg |
And if you curl the fedora_url for the service file, you get redirected:
vagrant@claw:/var/www/html/Crayfish/Milliner$ curl -i http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5
HTTP/1.1 307 Temporary Redirect
Server: Apache-Coyote/1.1
Cache-Control: private
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Set-Cookie: JSESSIONID=AAC5D77ADA1C7B6DFC249722794544F6; Path=/fcrepo/; HttpOnly
Set-Cookie: rememberMe=deleteMe; Path=/fcrepo; Max-Age=0; Expires=Mon, 22-Apr-2019 17:22:38 GMT
ETag: "3bfc948cb7134fb0150061f06960469eb253916e"
Last-Modified: Tue, 23 Apr 2019 14:52:08 GMT
Accept-Ranges: bytes
Content-Disposition: attachment; filename=""; creation-date="Tue, 23 Apr 2019 14:52:08 GMT"; modification-date="Tue, 23 Apr 2019 14:52:08 GMT"; size=927638
Link: <http://www.w3.org/ns/ldp#Resource>;rel="type"
Link: <http://www.w3.org/ns/ldp#NonRDFSource>;rel="type"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata>; rel="describedby"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5>; rel="timegate"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5>; rel="original"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:versions>; rel="timemap"
Link: <http://mementoweb.org/ns#OriginalResource>; rel="type"
Link: <http://mementoweb.org/ns#TimeGate>; rel="type"
Accept-External-Content-Handling: copy,redirect,proxy
Allow: DELETE,HEAD,GET,PUT,OPTIONS
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:acl>; rel="acl"
Content-Location: http://localhost:8000/sites/default/files/2019-04/5-Service%20File.jpg
Location: http://localhost:8000/sites/default/files/2019-04/5-Service%20File.jpg
Content-Type: image/jpeg
Content-Length: 0
Date: Tue, 23 Apr 2019 17:22:38 GMT
And if you curl the "describedby" link header, you get the metadata:
vagrant@claw:/var/www/html/Crayfish/Milliner$ curl -i http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Cache-Control: private
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Set-Cookie: JSESSIONID=66EA03561EF11D85B08586CCA968EFAE; Path=/fcrepo/; HttpOnly
Set-Cookie: rememberMe=deleteMe; Path=/fcrepo; Max-Age=0; Expires=Mon, 22-Apr-2019 17:22:54 GMT
ETag: W/"a92037e13a684209628154e87405fd283d27b67d"
Last-Modified: Tue, 23 Apr 2019 14:52:10 GMT
Link: <http://www.w3.org/ns/ldp#Resource>;rel="type"
Link: <http://www.w3.org/ns/ldp#RDFSource>; rel="type"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5>; rel="describes"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata>; rel="timegate"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata>; rel="original"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata/fcr:versions>; rel="timemap"
Link: <http://mementoweb.org/ns#OriginalResource>; rel="type"
Link: <http://mementoweb.org/ns#TimeGate>; rel="type"
Accept-External-Content-Handling: copy,redirect,proxy
Accept-Patch: application/sparql-update
Allow: HEAD,GET,DELETE,PUT,PATCH,OPTIONS
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:acl>; rel="acl"
Preference-Applied: return=representation
Vary: Prefer
Vary: Accept
Vary: Range
Vary: Accept-Encoding
Vary: Accept-Language
Vary: Accept-Datetime
Content-Type: text/turtle;charset=utf-8
Content-Length: 2453
Date: Tue, 23 Apr 2019 17:22:54 GMT
@prefix premis: <http://www.loc.gov/premis/rdf/v1#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix fedora: <http://fedora.info/definitions/v4/repository#> .
@prefix ebucore: <http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix iana: <http://www.iana.org/assignments/relation/> .
<http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5>
rdf:type fedora:NonRdfSourceDescription ;
rdf:type <http://pcdm.org/models#File> ;
rdf:type <http://pcdm.org/use#ServiceFile> ;
rdf:type fedora:Binary ;
rdf:type fedora:Resource ;
fedora:lastModifiedBy "bypassAdmin" ;
<http://schema.org/dateModified> "2019-04-23T14:52:08+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
<http://schema.org/author> <http://localhost:8000/user/1?_format=jsonld> ;
<http://schema.org/sameAs> "http://localhost:8000/sites/default/files/2019-04/5-Service%20File.jpg" ;
<http://schema.org/dateCreated> "2019-04-23T14:52:08+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
ebucore:width "1072"^^<http://www.w3.org/2001/XMLSchema#int> ;
premis:hasSize "927638"^^<http://www.w3.org/2001/XMLSchema#long> ;
ebucore:hasMimeType "image/jpeg" ;
fedora:createdBy "bypassAdmin" ;
fedora:created "2019-04-23T14:52:08.93Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
fedora:lastModified "2019-04-23T14:52:10.962Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
ebucore:height "698"^^<http://www.w3.org/2001/XMLSchema#int> ;
<http://pcdm.org/models#fileOf> <http://localhost:8000/node/5?_format=jsonld> ;
rdf:label "5-Service File.jpg"@en ;
ebucore:filename "" ;
dcterms:title "5-Service File.jpg"@en ;
rdf:type ldp:NonRDFSource ;
iana:describedby <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata> ;
fedora:hasFixityService <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:fixity> .
:champagne:
I created a Drupal Repository Object, then a Media, and uploaded a File.
The Object and the File are in Gemini and so I know they're in Fedora. The Media object (with whatever metadata attached) does not appear to be in Fedora.
(I queried the Gemini table, for lack of a better way to tell).
Interestingly, if I do something that generates derivatives, then the resulting Media objects are in Fedora (and Gemini) though the files are not. (media/4 was generated from media/3)
Here's another weird thing. The "original file" media object is in the triplestore.