Closed dannylamb closed 5 years ago
@dannylamb is that something you want to have fixed as such? Since Media entities are not file entities, not sure how to handle that. I would have guessed that media entities were a way of managing nicely images, etc, but the real Non RDF Source payload would come from one of the file entities connected to them. Would that not leave all the properties that are part of the media entity but not of (one of) the files that are part of the media entity out?
jsonld module handles, or at least would like to handle this, as generic and ldp-less as possible: Says the jsonld module 😺
@DiegoPino That's precisely the conundrum. The Drupal and LDP models a bit at odds. So long as we're ok with the fact that the JSONLD we generate for Media has the wrong subject w/r/t LDP, then it's reasonable to do this conversion elsewhere.
And FWIW I'm totally ok with that.
In my understanding, the Media entity in Drupal is "a wrapper for the file" and any fields/values on a Media entity - for example: ebucore:height is 2394px, or mimetype is image/tiff, are semantically the properties of the file. It's just that file entities, in Drupal, can't have fields attached. So the fields go on the Media. Any other fields or properties you attach to a Media should, I think, describe the file proper (otherwise put it on the node).
The Media contains the same information, and is analogous to, the /fcr:metadata document describing the binary. However, it's different structurally - in Drupal it's "the middleman" tying a node to a file. In Fedora, the file itself points to the node, through its properties (which are accessed through the document at /fcr:metadata).
Taking the Media's JSONLD serialization, it would say: (using REALLY LAZY shorthand)
<DRUPAL/media/1> pcdm:fileOf <DRUPAL/node/1>,
schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> .
This does not make a lot of sense because the media is not, semantically, "the same as" the file nor "a file of" the node.
It's only when in Fedora, and the subject is swapped out for the Fedora Binary, that it makes sense:
<FEDORA/fcrepo/rest/stuff/filename> pcdm:fileOf <DRUPAL/node/1> ,
schema:sameAs <DRUPAL/_flysystem/fedora/stuff/filename> .
that it makes semantic sense. I don't want to put too much weight in the Media's jsonld here because it's misleading as LD, but it works as the in-transit-to-fedora construct.
Here's a diagram of the JSONLD of a node, media, and file, along with the fedora objects and their types (both according to HTTP headers, and to the documents they delivered).
Point is, I agree that the serialization would make semantic sense if you make the main Subject (id) the URI of the file in Drupal rather than the Media in drupal. (though it already contains a schema:sameAs to that effect, so maybe . As far as I can tell (using a CLAW instance that is some days out of date) the original problematic triple, <uri_of_media> iana:describes <uri_of_file>
, is not present, so i'm not sure what needs to be done for this issue.
@dannylamb so I did this and it has no effect. I am guessing because you are only grabbing the media elements graph and by moving this triple from the media -> file to file -> media it is outside that graph. So its the same as removing it.
Whoa! missed the @rosiel comment. reading now.
Okay, I agree with @rosiel above. This is not working due to our serializing method, but even if it did it wouldn't necessarily make sense.
A simple way to add this (not that it makes sense) would be to replace iana:describes
with iana:describedby
and make both the subject and object the media element.
So <drupal/media/2> iana:describes <drupal/file/3>
becomes <drupal/media/2> iana:describedby <drupal/media/2>
, again this doesn't make sense.
But in Fedora it would become
<fedora/NonRdfSource/1234-5678> iana:describedby <drupal/media/2>
.
I'm not sure its worth the hassle though.
This issue is from 2017, and I don't see any iana:describes
in the graph returned from a media in 2019 - I think it was removed a while ago.
Using curl, I see it in the header for /media/x?_format=jsonld. Link: <http://DOMAIN/_flysystem/fedora/2019-05/IMG_0606.JPG>; rel="describes"; type="image/jpeg"
. This statement is ... accurate, no?
To rewrite the original issue to reflect current behaviour:
Right now a Media entity, when serialized, has itself as the subject and contains triples of the form
<uri_of_media> ebucore:height '3024'
, but really it needs to be<uri_of_file> ebucore:height '3024'
to be semantically accurate. Also, the existence of a 'media document' describing the file is in line with how Fedora generates a LDP-RS for every LDP-NR that gets created, since even in its HTTP headers it claims itiana:describes <uri_of_file>
.
@rosiel That link header is indeed accurate. As is your summary about the subject uri. The missing piece we should add on top is an iana:descibedby
with the media's url in the RDF. That would tie it up all nicely.
To stick with your example, something like this in a jsonld GET response for a media
<uri_of_file> ebucore:height '3024'
<uri_of_file> iana:describedby <uri_of_media>
with a rel="describes"
link header pointing to <uri_of_file>
.
Ok, here's what we have now
{
"@graph":[
{
"@id":"http:\/\/localhost:8000\/media\/1",
"@type":[
"http:\/\/pcdm.org\/models#File",
"http:\/\/pcdm.org\/use#OriginalFile"
],
"http:\/\/purl.org\/dc\/terms\/title":[
{
"@value":"Original Image",
"@language":"en"
}
],
"http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#label":[
{
"@value":"Original Image",
"@language":"en"
}
],
"http:\/\/schema.org\/author":[
{
"@id":"http:\/\/localhost:8000\/user\/1"
}
],
"http:\/\/schema.org\/dateCreated":[
{
"@value":"2019-05-15T19:21:42+00:00",
"@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
}
],
"http:\/\/schema.org\/dateModified":[
{
"@value":"2019-05-15T19:22:12+00:00",
"@type":"http:\/\/www.w3.org\/2001\/XMLSchema#dateTime"
}
],
"http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#height":[
{
"@value":"1018",
"@type":"http:\/\/www.w3.org\/2001\/XMLSchema#int"
}
],
"http:\/\/pcdm.org\/models#fileOf":[
{
"@id":"http:\/\/localhost:8000\/node\/1"
}
],
"http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#hasMimeType":[
{
"@value":"image\/jpeg",
"@type":"http:\/\/www.w3.org\/2001\/XMLSchema#string"
}
],
"http:\/\/www.ebu.ch\/metadata\/ontologies\/ebucore\/ebucore#width":[
{
"@value":"904",
"@type":"http:\/\/www.w3.org\/2001\/XMLSchema#int"
}
],
"http:\/\/schema.org\/sameAs":[
{
"@value":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg"
}
]
},
{
"@id":"http:\/\/localhost:8000\/user\/1",
"@type":[
"http:\/\/schema.org\/Person"
]
},
{
"@id":"http:\/\/localhost:8000\/node\/1",
"@type":[
"http:\/\/pcdm.org\/models#Object"
]
}
]
}
Feels like we've batted around two ways of doing this
schema:sameAs
to iana:describes
, and then process the rest to be more fedora/ldp-ish in Milliner. This is done with a simple config change using Context, and would result in the following from Drupal (editied for brevity):
{
"@graph":[
{
"@id":"http:\/\/localhost:8000\/media\/1",
"@type":[
"http:\/\/pcdm.org\/models#File",
"http:\/\/pcdm.org\/use#OriginalFile"
],
"http:\/\/pcdm.org\/models#fileOf":[
{
"@id":"http:\/\/localhost:8000\/node\/1"
}
],
"http:\/\/www.iana.org\/assignments\/relation\/describes":[
{
"@value":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg"
}
]
...
},
...
]
}
which isn't 100% over-the-top semantically correct, but is actually the more intuitive solution to folks coming from outside the ldp sphere. We'd then further process it in Crayfish/Alpaca to have it make sense in fedora and the triplestore.
@id
to be that of the file, and use iana:describedby
to reference the media. This would look like (again, edited for brevity):
{
"@graph":[
{
"@id":"http:\/\/localhost:8000\/_flysystem\/fedora\/2019-05\/Flemming-Magic.jpg",
"@type":[
"http:\/\/pcdm.org\/models#File",
"http:\/\/pcdm.org\/use#OriginalFile"
],
"http:\/\/pcdm.org\/models#fileOf":[
{
"@id":"http:\/\/localhost:8000\/node\/1"
}
],
"http:\/\/www.iana.org\/assignments\/relation\/describedby":[
{
"@value":"http:\/\/localhost:8000\/media\/1"
}
]
...
},
...
]
}
This is the most semantically correct, but may come off as strange to the uninitiated. It would require less processing to get into the right shape for Fedora and the Triplestore, though.
No. 2 makes sense. No. 1 would be a regression back into the semantic flaw from 2017 that caused this issue to be created.
@rosiel @whikloj PRs are up^^
Testing instructions are in https://github.com/Islandora-CLAW/islandora/pull/136
@rosiel your diagram in https://github.com/Islandora-CLAW/CLAW/issues/662#issuecomment-491408492 is epic. Mind if I use it in my Open Repositories and iCamp slide decks, with full and genuflecting attribution?
@mjordan Yes, but no genuflecting please, and it was a product of collaborating with @elizoller.
[edit: also, unless things change by then, please include the fileOf arrow that gets crossed out and redirected to Drupal. ;) ]
OK, will nix the genuflecting, cocredit @elizoller, and note updates.
😃
These might be right?
@elizoller++
Right now a Media entity, when serialized, has itself as the subject and contains a triple of the form
<uri_of_media> iana:describes <uri_of_file>
, but really it needs to be<uri_of_file> iana:describedby <uri_of_media>
to be in line with how Fedora generates a LDP-RS for every LDP-NR that gets created. This amounts to adding a special case for Media entities in the jsonld module.Here's what it looks like now (non-relevant triples removed for brevity):
And here's what it should look like: