Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

Serialize Media entities as LDP-RS describing the File, not itself #662

Closed dannylamb closed 5 years ago

dannylamb commented 7 years ago

Right now a Media entity, when serialized, has itself as the subject and contains a triple of the form <uri_of_media> iana:describes <uri_of_file>, but really it needs to be <uri_of_file> iana:describedby <uri_of_media> to be in line with how Fedora generates a LDP-RS for every LDP-NR that gets created. This amounts to adding a special case for Media entities in the jsonld module.

Here's what it looks like now (non-relevant triples removed for brevity):

{
    "@graph":[
        {
            "@id":"http:\/\/localhost:8000\/media\/1?_format=jsonld",
           ...
            "http:\/\/www.iana.org\/assignments\/relation\/describes":[
                {
                    "@id":"http:\/\/localhost:8000\/sites\/default\/files\/2017-06\/sample.jp2"
                }
            ]
        }
        ...
}

And here's what it should look like:

{
    "@graph":[
        {
            "@id":"http:\/\/localhost:8000\/sites\/default\/files\/2017-06\/sample.jp2",
           ...
            "http:\/\/www.iana.org\/assignments\/relation\/describedby":[
                {
                    "@id":"http:\/\/localhost:8000\/media\/1?_format=jsonld"
                }
            ]
        }
        ...
}
DiegoPino commented 7 years ago

@dannylamb is that something you want to have fixed as such? Since Media entities are not file entities, not sure how to handle that. I would have guessed that media entities were a way of managing nicely images, etc, but the real Non RDF Source payload would come from one of the file entities connected to them. Would that not leave all the properties that are part of the media entity but not of (one of) the files that are part of the media entity out?

jsonld module handles, or at least would like to handle this, as generic and ldp-less as possible: Says the jsonld module 😺

dannylamb commented 7 years ago

@DiegoPino That's precisely the conundrum. The Drupal and LDP models a bit at odds. So long as we're ok with the fact that the JSONLD we generate for Media has the wrong subject w/r/t LDP, then it's reasonable to do this conversion elsewhere.

dannylamb commented 7 years ago

And FWIW I'm totally ok with that.

rosiel commented 5 years ago

In my understanding, the Media entity in Drupal is "a wrapper for the file" and any fields/values on a Media entity - for example: ebucore:height is 2394px, or mimetype is image/tiff, are semantically the properties of the file. It's just that file entities, in Drupal, can't have fields attached. So the fields go on the Media. Any other fields or properties you attach to a Media should, I think, describe the file proper (otherwise put it on the node).

The Media contains the same information, and is analogous to, the /fcr:metadata document describing the binary. However, it's different structurally - in Drupal it's "the middleman" tying a node to a file. In Fedora, the file itself points to the node, through its properties (which are accessed through the document at /fcr:metadata).

Taking the Media's JSONLD serialization, it would say: (using REALLY LAZY shorthand)

<DRUPAL/media/1> pcdm:fileOf <DRUPAL/node/1>,
      schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> .

This does not make a lot of sense because the media is not, semantically, "the same as" the file nor "a file of" the node.

It's only when in Fedora, and the subject is swapped out for the Fedora Binary, that it makes sense:

<FEDORA/fcrepo/rest/stuff/filename>  pcdm:fileOf <DRUPAL/node/1> ,
     schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> . 

that it makes semantic sense. I don't want to put too much weight in the Media's jsonld here because it's misleading as LD, but it works as the in-transit-to-fedora construct.

Here's a diagram of the JSONLD of a node, media, and file, along with the fedora objects and their types (both according to HTTP headers, and to the documents they delivered). Islandora-and-fedora-jsonld-2019-05-09

Point is, I agree that the serialization would make semantic sense if you make the main Subject (id) the URI of the file in Drupal rather than the Media in drupal. (though it already contains a schema:sameAs to that effect, so maybe . As far as I can tell (using a CLAW instance that is some days out of date) the original problematic triple, <uri_of_media> iana:describes <uri_of_file>, is not present, so i'm not sure what needs to be done for this issue.

whikloj commented 5 years ago

@dannylamb so I did this and it has no effect. I am guessing because you are only grabbing the media elements graph and by moving this triple from the media -> file to file -> media it is outside that graph. So its the same as removing it.

whikloj commented 5 years ago

Whoa! missed the @rosiel comment. reading now.

whikloj commented 5 years ago

Okay, I agree with @rosiel above. This is not working due to our serializing method, but even if it did it wouldn't necessarily make sense.

A simple way to add this (not that it makes sense) would be to replace iana:describes with iana:describedby and make both the subject and object the media element.

So <drupal/media/2> iana:describes <drupal/file/3> becomes <drupal/media/2> iana:describedby <drupal/media/2>, again this doesn't make sense.

But in Fedora it would become <fedora/NonRdfSource/1234-5678> iana:describedby <drupal/media/2>.

I'm not sure its worth the hassle though.

rosiel commented 5 years ago

This issue is from 2017, and I don't see any iana:describes in the graph returned from a media in 2019 - I think it was removed a while ago.

Using curl, I see it in the header for /media/x?_format=jsonld. Link: <http://DOMAIN/_flysystem/fedora/2019-05/IMG_0606.JPG>; rel="describes"; type="image/jpeg". This statement is ... accurate, no?

To rewrite the original issue to reflect current behaviour:

Right now a Media entity, when serialized, has itself as the subject and contains triples of the form <uri_of_media> ebucore:height '3024', but really it needs to be <uri_of_file> ebucore:height '3024' to be semantically accurate. Also, the existence of a 'media document' describing the file is in line with how Fedora generates a LDP-RS for every LDP-NR that gets created, since even in its HTTP headers it claims it iana:describes <uri_of_file>.

dannylamb commented 5 years ago

@rosiel That link header is indeed accurate. As is your summary about the subject uri. The missing piece we should add on top is an iana:descibedby with the media's url in the RDF. That would tie it up all nicely.

To stick with your example, something like this in a jsonld GET response for a media

<uri_of_file> ebucore:height '3024'
<uri_of_file> iana:describedby <uri_of_media>

with a rel="describes" link header pointing to <uri_of_file>.

dannylamb commented 5 years ago

Ok, here's what we have now

{
   "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/media\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/purl.org\/dc\/terms\/title":[
            {
               "@value":"Original Image",
               "@language":"en"
            }
         ],
         "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#label":[
            {
               "@value":"Original Image",
               "@language":"en"
            }
         ],
         "http:\/\/schema.org\/author":[
            {
               "@id":"http:\/\/localhost:8000\/user\/1"
            }
         ],
         "http:\/\/schema.org\/dateCreated":[
            {
               "@value":"2019-05-15T19:21:42+00:00",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
         ],
         "http:\/\/schema.org\/dateModified":[
            {
               "@value":"2019-05-15T19:22:12+00:00",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#height":[
            {
               "@value":"1018",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#int"
            }
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#hasMimeType":[
            {
               "@value":"image\/jpeg",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#string"
            }
         ],
         "http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#width":[
            {
               "@value":"904",
               "@type":"http:\/\/www.w3.org\/2001\/XMLSchema#int"
            }
         ],
         "http:\/\/schema.org\/sameAs":[
            {
               "@value":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg"
            }
         ]
      },
      {
         "@id":"http:\/\/localhost:8000\/user\/1",
         "@type":[
            "http:\/\/schema.org\/Person"
         ]
      },
      {
         "@id":"http:\/\/localhost:8000\/node\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#Object"
         ]
      }
   ]
}

Feels like we've batted around two ways of doing this

  1. Just change schema:sameAs to iana:describes, and then process the rest to be more fedora/ldp-ish in Milliner. This is done with a simple config change using Context, and would result in the following from Drupal (editied for brevity):
    {
    "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/media\/1",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.iana.org\/assignments\/relation\/describes":[
            {
               "@value":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg"
            }
         ]
         ...
      },
      ...
    ]
    }

which isn't 100% over-the-top semantically correct, but is actually the more intuitive solution to folks coming from outside the ldp sphere. We'd then further process it in Crayfish/Alpaca to have it make sense in fedora and the triplestore.

  1. We replace the @id to be that of the file, and use iana:describedby to reference the media. This would look like (again, edited for brevity):
    {
    "@graph":[
      {
         "@id":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg",
         "@type":[
            "http:\/\/pcdm.org\/models#File",
            "http:\/\/pcdm.org\/use#OriginalFile"
         ],
         "http:\/\/pcdm.org\/models#fileOf":[
            {
               "@id":"http:\/\/localhost:8000\/node\/1"
            }
         ],
         "http:\/\/www.iana.org\/assignments\/relation\/describedby":[
            {
               "@value":"http:\/\/localhost:8000\/media\/1"
            }
         ]
         ...
      },
      ...
    ]
    }

This is the most semantically correct, but may come off as strange to the uninitiated. It would require less processing to get into the right shape for Fedora and the Triplestore, though.

rosiel commented 5 years ago

No. 2 makes sense. No. 1 would be a regression back into the semantic flaw from 2017 that caused this issue to be created.

dannylamb commented 5 years ago

@rosiel @whikloj PRs are up^^

Testing instructions are in https://github.com/Islandora-CLAW/islandora/pull/136

mjordan commented 5 years ago

@rosiel your diagram in https://github.com/Islandora-CLAW/CLAW/issues/662#issuecomment-491408492 is epic. Mind if I use it in my Open Repositories and iCamp slide decks, with full and genuflecting attribution?

rosiel commented 5 years ago

@mjordan Yes, but no genuflecting please, and it was a product of collaborating with @elizoller.

[edit: also, unless things change by then, please include the fileOf arrow that gets crossed out and redirected to Drupal. ;) ]

mjordan commented 5 years ago

OK, will nix the genuflecting, cocredit @elizoller, and note updates.

😃

elizoller commented 5 years ago

These might be right? Islandora 8 - Drupal Node and Fedora Resource - Service File Islandora 8 - Drupal Node and Fedora Resource - Original File

mjordan commented 5 years ago

@elizoller++