Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

Media (created through Drupal) don't appear in Fedora #1079

Closed rosiel closed 5 years ago

rosiel commented 5 years ago

I created a Drupal Repository Object, then a Media, and uploaded a File.

The Object and the File are in Gemini and so I know they're in Fedora. The Media object (with whatever metadata attached) does not appear to be in Fedora.

(I queried the Gemini table, for lack of a better way to tell).

mysql> select  fedora_uri from Gemini where drupal_uri = 'http://localhost:8000/media/3?_format=jsonld';
Empty set (0.00 sec)

Interestingly, if I do something that generates derivatives, then the resulting Media objects are in Fedora (and Gemini) though the files are not. (media/4 was generated from media/3)

mysql> select  fedora_uri from Gemini where drupal_uri = 'http://localhost:8000/media/4?_format=jsonld';
+------------------------------------------------------------------------------------+
| fedora_uri                                                                         |
+------------------------------------------------------------------------------------+
| http://localhost:8080/fcrepo/rest/63/b4/a0/f2/63b4a0f2-8a65-4106-b0fb-b4b328ac1cf5 |
+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Here's another weird thing. The "original file" media object is in the triplestore.

select ?p ?o
WHERE {
<http://localhost:8000/media/2?_format=jsonld> ?p ?o .
}
p o
http://pcdm.org/models#fileOf http://localhost:8000/node/2?_format=jsonld
schema:dateModified 2019-04-10T18:31:57.000Z
schema:author http://localhost:8000/user/1?_format=jsonld
schema:dateCreated 2019-04-10T18:31:19.000Z
schema:sameAs http://localhost:8000/_flysystem/fedora/2019-04/800px-Tyrannosaurus_Rex_colored.png
http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#hasMimeType image/png
rdf:label image 1 origi
http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#height 197
http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#width 149
rdf:type http://localhost:8000/taxonomy/term/16?_format=jsonld
rdf:type http://pcdm.org/use#OriginalFile
rdf:type http://pcdm.org/models#File
dcterm:title image 1 origi
dannylamb commented 5 years ago

Been digging through this, and there's a lot going on here

Inconsistencies in Gemini indexing:

Binary content in fedora:// vs public:// have their RDF indexed differently, and as a consequence, are indexed in Gemini differently. What you're seeing is expected behaviour w/r/t your Gemini index, but that doesn't make it good behaviour. What really needs to happen is we start using the Fedora spec's external content features. Then things in and out of Fedora will get treated more consistently.

Triplestore indexing from Fedora is busted

Thtat'd at least help you confirm things without having to manually traverse Fedora, parsing RDF as you go. I'm updating to newest on a bunch of things and am testing the install now with https://github.com/Islandora-CLAW/Alpaca/tree/fix-triplestore-indexing.

Finding the Fedora representation is a PITA

We've got some tickets out there to bubble the mapping in Gemini up into the html as well as link headers. Making all of that transparent (for folks with the right permissions) will dodge the rigmarole of looking things up in Gemini.

So... yeah... apologies for the mine field :worried: I'm patching up the triplestore indexing and working my way back to the actual issue. We'll get it sorted out.

whikloj commented 5 years ago

Yeah I was going to say that we got the fcrepo-camel and fcrepo-camel-toolbox working against Fedora 5, so we should be able to resolve the triplestore stuff. Could possibly also work towards getting the API-X stuff working too, but I'm not sure what is required there.

seth-shaw-unlv commented 5 years ago

So, @dannylamb , what would it take to get the Media of a fedora:// file indexed in Fedora? Our metadata folks really want to start attaching some PREMIS 3 metadata (e.g. preservationLevel) to files, and the corresponding Media entity seems the logical place to do is since file entities aren't fieldable.

I'm willing to try my hand at it if I can get pointed in the right direction.

dannylamb commented 5 years ago

@whikloj I can get fcrepo-indexing-triplestore up and running, but then that messes with islandora-indexing-triplestore :( fcrepo-api-x has a dependency that could be updated, too, but I'm just commenting it out for now until I can set things straight in Alpaca and islandora-indexing-triplestore

dannylamb commented 5 years ago

@seth-shaw-unlv I would have to pull the thread further, but https://github.com/Islandora-CLAW/Crayfish/blob/master/Milliner/src/Controller/MillinerController.php#L99 is the place to start. That's the route that's for media for files that live in Fedora. Media whose files don't live in Fedora actually use the saveNode function above it (as do taxonomy terms, FYI).

rosiel commented 5 years ago

What are "media whose files don't live in Fedora"?

I have a derivative in Drupal ("2 - Service File") that was automatically generated when I uploaded an original file. It's clearly saved "in Drupal" as in, the URL of the file is at /sites/default/files/2019-04 and if I go to that directory on the filesystem, the file is there.

But the field, according to drupal, is a "file_media_image" which, is configured to use Fedora (flysystem) storage. Indeed - if I don't like the thumbnail and decide to replace it, (remove the existing file, add a new file) then the file I add gets stored in Fedora instead of Drupal.

So because the Service File was added through the backend, you can trick the field storage to make it actually save in Drupal? Weird flex.

Could we instead have "Image (Fedora)" and "Image (Drupal)" as two separate Media types, so that a user can predict/expect/understand the file storage behavour?

dannylamb commented 5 years ago

@rosiel I'm threading the needle on Drupal behaviour there. Configuring the field controls which filesystem is used when uploading through the UI. But derivatives are generated using the REST api, which accepts a drupal uri in the Content-Location header. You can, however, configure where derivatives are deposited by configuring the action for that derivative.

So by splitting out Media types you'll get the front end. And then you can configure the backend for both media types with Context/Actions to lock down where the derivatives are placed. If that seems like better default behaviour to folks, we can do that. Would just mean a PR to islandora_demo with the exported config.

seth-shaw-unlv commented 5 years ago

As an alternative, could we alter the file upload form to include a file-descriptor dropdown? Then a user could select if a file is destined for Drupal or Fedora at that point. Making a duplicate set of Media types makes me cringe and I would like to avoid it if possible.

dannylamb commented 5 years ago

@seth-shaw-unlv A dropdown in the file widget would be a welcome improvement regardless.

dannylamb commented 5 years ago

Ok, so I've managed to fix the triplestore for Fedora indexing, and looking at this further, am receiving 403's from Milliner when trying to hit localhost:8000/milliner/media/{field_name}.

[2019-04-16 07:58:47] app.ERROR:  {"Exception":"[object] (RuntimeException(code: 403): Client error: `PUT http://localhost:8080/fcrepo/rest/2019-04/Learning%20SPARQL.pdf/fcr:metadata` resulted in a `403 Forbidden` response: http://localhost:8080/fcrepo/rest/2019-04/Learning SPARQL.pdf is not in the topic of this RDF, which is http://localhost:8080/fcrepo/rest/2019-04/Learning%20SPARQL.pdf/fcr:metadata at /var/www/html/Crayfish/Milliner/src/Service/MillinerService.php:485)"} []

Looking like maybe now the /fcr:metadata elbow is the subject for RDF resource associated with a file. Maybe that changed in 5.0 or was wrong all along? Either way, this shouldn't be too terrible of a fix.

dannylamb commented 5 years ago

So I've gone ahead and taken the liberty of updating to use external content properly through the Fedora API. I'm pushing up some branches now, but will have to update the fcrepo role to deploy some config to get external content going. I'll let you know when it's ready, but nonetheless....

From Gemini we've got all the files indexed, regardless of what filesystem they use:

uuid fedora_uri drupal_uri
efd0e0f1-1ea2-4e07-b4ed-50a2b620ad50 http://localhost:8080/fcrepo/rest/2019-04/Casionova1.jpg http://localhost:8000/_flysystem/fedora/2019-04/Casionova1.jpg
8bcd29f5-8b88-40f5-9b77-d742c53492bb http://localhost:8080/fcrepo/rest/8b/cd/29/f5/8bcd29f5-8b88-40f5-9b77-d742c53492bb http://localhost:8000/sites/default/files/2019-04/5-Thumbnail%20Image.jpg
81ab2725-55d9-4f8a-82a8-8e2af655cdd5 http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5 http://localhost:8000/sites/default/files/2019-04/5-Service%20File.jpg

And if you curl the fedora_url for the service file, you get redirected:

vagrant@claw:/var/www/html/Crayfish/Milliner$ curl -i http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5
HTTP/1.1 307 Temporary Redirect
Server: Apache-Coyote/1.1
Cache-Control: private
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Set-Cookie: JSESSIONID=AAC5D77ADA1C7B6DFC249722794544F6; Path=/fcrepo/; HttpOnly
Set-Cookie: rememberMe=deleteMe; Path=/fcrepo; Max-Age=0; Expires=Mon, 22-Apr-2019 17:22:38 GMT
ETag: "3bfc948cb7134fb0150061f06960469eb253916e"
Last-Modified: Tue, 23 Apr 2019 14:52:08 GMT
Accept-Ranges: bytes
Content-Disposition: attachment; filename=""; creation-date="Tue, 23 Apr 2019 14:52:08 GMT"; modification-date="Tue, 23 Apr 2019 14:52:08 GMT"; size=927638
Link: <http://www.w3.org/ns/ldp#Resource>;rel="type"
Link: <http://www.w3.org/ns/ldp#NonRDFSource>;rel="type"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata>; rel="describedby"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5>; rel="timegate"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5>; rel="original"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:versions>; rel="timemap"
Link: <http://mementoweb.org/ns#OriginalResource>; rel="type"
Link: <http://mementoweb.org/ns#TimeGate>; rel="type"
Accept-External-Content-Handling: copy,redirect,proxy
Allow: DELETE,HEAD,GET,PUT,OPTIONS
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:acl>; rel="acl"
Content-Location: http://localhost:8000/sites/default/files/2019-04/5-Service%20File.jpg
Location: http://localhost:8000/sites/default/files/2019-04/5-Service%20File.jpg
Content-Type: image/jpeg
Content-Length: 0
Date: Tue, 23 Apr 2019 17:22:38 GMT

And if you curl the "describedby" link header, you get the metadata:

vagrant@claw:/var/www/html/Crayfish/Milliner$ curl -i http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Cache-Control: private
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Set-Cookie: JSESSIONID=66EA03561EF11D85B08586CCA968EFAE; Path=/fcrepo/; HttpOnly
Set-Cookie: rememberMe=deleteMe; Path=/fcrepo; Max-Age=0; Expires=Mon, 22-Apr-2019 17:22:54 GMT
ETag: W/"a92037e13a684209628154e87405fd283d27b67d"
Last-Modified: Tue, 23 Apr 2019 14:52:10 GMT
Link: <http://www.w3.org/ns/ldp#Resource>;rel="type"
Link: <http://www.w3.org/ns/ldp#RDFSource>; rel="type"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5>; rel="describes"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata>; rel="timegate"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata>; rel="original"
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata/fcr:versions>; rel="timemap"
Link: <http://mementoweb.org/ns#OriginalResource>; rel="type"
Link: <http://mementoweb.org/ns#TimeGate>; rel="type"
Accept-External-Content-Handling: copy,redirect,proxy
Accept-Patch: application/sparql-update
Allow: HEAD,GET,DELETE,PUT,PATCH,OPTIONS
Link: <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:acl>; rel="acl"
Preference-Applied: return=representation
Vary: Prefer
Vary: Accept
Vary: Range
Vary: Accept-Encoding
Vary: Accept-Language
Vary: Accept-Datetime
Content-Type: text/turtle;charset=utf-8
Content-Length: 2453
Date: Tue, 23 Apr 2019 17:22:54 GMT

@prefix premis:  <http://www.loc.gov/premis/rdf/v1#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix fedora:  <http://fedora.info/definitions/v4/repository#> .
@prefix ebucore:  <http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#> .
@prefix ldp:  <http://www.w3.org/ns/ldp#> .
@prefix dcterms:  <http://purl.org/dc/terms/> .
@prefix iana:  <http://www.iana.org/assignments/relation/> .

<http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5>
        rdf:type                    fedora:NonRdfSourceDescription ;
        rdf:type                    <http://pcdm.org/models#File> ;
        rdf:type                    <http://pcdm.org/use#ServiceFile> ;
        rdf:type                    fedora:Binary ;
        rdf:type                    fedora:Resource ;
        fedora:lastModifiedBy       "bypassAdmin" ;
        <http://schema.org/dateModified>  "2019-04-23T14:52:08+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
        <http://schema.org/author>  <http://localhost:8000/user/1?_format=jsonld> ;
        <http://schema.org/sameAs>  "http://localhost:8000/sites/default/files/2019-04/5-Service%20File.jpg" ;
        <http://schema.org/dateCreated>  "2019-04-23T14:52:08+00:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
        ebucore:width               "1072"^^<http://www.w3.org/2001/XMLSchema#int> ;
        premis:hasSize              "927638"^^<http://www.w3.org/2001/XMLSchema#long> ;
        ebucore:hasMimeType         "image/jpeg" ;
        fedora:createdBy            "bypassAdmin" ;
        fedora:created              "2019-04-23T14:52:08.93Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
        fedora:lastModified         "2019-04-23T14:52:10.962Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
        ebucore:height              "698"^^<http://www.w3.org/2001/XMLSchema#int> ;
        <http://pcdm.org/models#fileOf>  <http://localhost:8000/node/5?_format=jsonld> ;
        rdf:label                   "5-Service File.jpg"@en ;
        ebucore:filename            "" ;
        dcterms:title               "5-Service File.jpg"@en ;
        rdf:type                    ldp:NonRDFSource ;
        iana:describedby            <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:metadata> ;
        fedora:hasFixityService     <http://localhost:8080/fcrepo/rest/81/ab/27/25/81ab2725-55d9-4f8a-82a8-8e2af655cdd5/fcr:fixity> .

:champagne:

dannylamb commented 5 years ago

Resolved via https://github.com/Islandora-Devops/claw-playbook/pull/100