Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
103 stars 71 forks source link

Entities without canonical linkTemplates break jsonld production #1068

Open ajstanley opened 5 years ago

ajstanley commented 5 years ago

If we have an entity without a canonical link template, we call out for it anyway. https://github.com/Islandora-CLAW/jsonld/blob/8.x-1.x/src/Normalizer/ContentEntityNormalizer.php#L254-L255

On my dev box I've done a blank return then filtered the empty @id tag out, but it feels kind of clumsy. Maybe there's a cleaner way?

whikloj commented 5 years ago

Now that I understand this better (I hope) the normalizer fails when accessing something like a "paragraph" (the Paragraphs module) because they are anonymous and re-usable so they don't have a canoncial URI.

It would probably (though this is debatable) be good to have the Paragraph ignored and instead the fields inside brought up and displayed as a regular part of the JSON-LD for the content node

seth-shaw-unlv commented 5 years ago

I would probably first check for a converter (e.g. someone could create a converter for paragraphs that collects all the referenced entities’ values into a string literal) but, in the absence of a converter, fall back and remove the property if the entity can’t provide a valid URI.

ajstanley commented 5 years ago

The trouble with accessing the fields inside a paragraph is that they must be taken as a whole - so if your paragraph has a repeatable group of, say addresses for someone who has more than one, you have to know which street goes with which town etc.

It's a simple matter to write a callback that aggregates the contained fields, but it's not all that rdf-ey to tie them together like that. Maybe document that best practises dictate that related fields ought to go in an inline entity if they are to be triplestored? It might be useful for a sparql query to bring back all we know about an object/node/media but the concatenation is likely too weak to search on in a meaningful way.

By allowing it to rely on a bespoke callback the responsibility shifts to the implementer to keep it as sane as they'd like for their purposes.

The immediate problem is the one in the blob. We're specifically saying if the link doesn't have a canonical template, go get it anyway. It's guaranteed to fail. We could either return a null and decide how we want to deal with it (it's what I did on my dev box before we scrapped paragraphs for other reasons) or wrap it all in a try/catch and log a meaningful error.

On Wed, Mar 27, 2019 at 4:17 PM Seth Shaw notifications@github.com wrote:

I would probably first check for a converter (e.g. someone could create a converter for paragraphs that collects all the referenced entities’ values into a string literal) but fall back and remove the property if the entity can’t provide a valid URI.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Islandora-CLAW/CLAW/issues/1068#issuecomment-477310946, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeNJxqh3Vm3f0YY6oEjcTAtXGsTNuAuks5va8PggaJpZM4cB8dL .

-- Alan Stanley Developer and Training Specialist Agile Humanities

whikloj commented 5 years ago

I believe it is valid RDF to concatenate the values in a comma separated list and push them in behind the predicate.

ie.

<http://localhost:8000/node/3?_format=jsonld> <http://purl.org/dc/elements/1.1/creator> "Bob Smith", "Jane Alexander", "Phil Ewing" .

Would that make it easier to accomplish?

ajstanley commented 5 years ago

One output is as easy as another. Comma separated strings seem like a good idea though and leave it up to the implementation whether or not to include blanks.

Outside the scope of this discussion but we could think about supplying some prebuilt call backs to be invoked by the mapping ymls.

On Thu, Mar 28, 2019, 11:08 AM Jared Whiklo, notifications@github.com wrote:

I believe it is valid RDF to concatenate the values in a comma separated list and push them in behind the predicate.

ie.

http://localhost:8000/node/3?_format=jsonld http://purl.org/dc/elements/1.1/creator "Bob Smith", "Jane Alexander", "Phil Ewing" .

Would that make it easier to accomplish?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Islandora-CLAW/CLAW/issues/1068#issuecomment-477630259, or mute the thread https://github.com/notifications/unsubscribe-auth/AAeNJ-NBhvrxypM1RjF2qa2pQ-8xUsFnks5vbNevgaJpZM4cB8dL .

dannylamb commented 5 years ago

Nonetheless, the code straight up checks to make sure something doesn't exist and then immediately proceeds to call the thing we verified isn't there. Originally it was calling just url() (can't remember the precise reasons) and then we switched to this for jsonld altering.

dannylamb commented 5 years ago

Ok, so this is a pretty big picture type deal... but why don't we consider this an opportunity to treat paragraphs as blank nodes? At this point, instead of trying to get the canonical url from the link template, we generate fragments for them instead?

rosiel commented 5 years ago

We've been considering blank nodes, especially for harder and more complex RDF mappings. I know, Fedora doesn't like blank nodes. At this point we are legit considering - for use cases where a mapping involving blank nodes is meaningful - putting a "simpler version" of RDF into Fedora, and a more rich/meaningful one in the Triplestore.

whikloj commented 5 years ago

Fedora doesn't do great with blank nodes (though I haven't tried lately), but it does do hash URIs. Which might be a good solution.

On Wed., Apr. 10, 2019, 13:21 Rosemary Le Faive, notifications@github.com wrote:

We've been considering blank nodes, especially for harder and more complex RDF mappings. I know, Fedora doesn't like blank nodes. At this point we are legit considering - for use cases where a mapping involving blank nodes is meaningful - putting a "simpler version" of RDF into Fedora, and a more rich/meaningful one in the Triplestore.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Islandora-CLAW/CLAW/issues/1068#issuecomment-481805798, or mute the thread https://github.com/notifications/unsubscribe-auth/ACua4eRMRwBIfYrlFMmEr_7-BxJDmz7jks5vfiuWgaJpZM4cB8dL .

dannylamb commented 5 years ago

@whikloj That's an important distinction I failed to make. I guess hash URIs is what I'm talking about. If the paragraphs have UUIDs, then http://example.org/node/1#{paragraphUUID} seems like it'd work just fine.

whikloj commented 5 years ago

I am unclear whether this is a 1.0.0 or a 1.x type of bug? Thoughts?

seth-shaw-unlv commented 5 years ago

@dannylamb, so how would those hash URIs be represented in the JSON-LD? Is it a JSON-LD blank node with a uuid field, or is it listed in the JSON-LD as a separate entity using the full URI as the @id?

I doubt Drupal will be happy responding to http://example.org/node/1#{paragraphUUID}?_format=jsonld, or perhaps it will just return the JSON-LD for node/1?

seth-shaw-unlv commented 5 years ago

@whikloj, this seems to me like a power-user case appropriate for 1.x.

whikloj commented 5 years ago

@seth-shaw-unlv because paragraphs don't have a canonical URI you can't point at them with a client, so the hash uri would really only be a one way transfer of information (ie Drupal -> Fedora).

rosiel commented 5 years ago

@whikloj I thought a one-way transfer of information was what we should expect. I've been operating under that assumption, as the metadata in Fedora seems like it's going to have to be less complex than we want to model in Drupal (simple S-P-O triples where "the object" is the S). (If hashed URIs are an option, then that may change, but I also would like to know what that would look like or what the uses/limitations are)

whikloj commented 5 years ago

@rosiel I am assuming a one-way transfer for the time being, but I would like as a (2.x or 3.x goal maybe) to be able to rebuild my Drupal from my Fedora.

As for metadata, it is my interpretation that all metadata modelling from Drupal will be mirrored in Fedora otherwise the question is what are we losing and why are we bothering to store the metadata in Fedora if it isn't all there.

rosiel commented 3 years ago

Noting that this ticket is still very much open. Alan described a specific thing that happens when we try to export data stored in Paragraphs fields into RDF. We cannot handle it, yet. The above conversation suggests having Paragraphs export "RDF fragments" that can be included in the (node or other entity's) RDF as a hash URI/"blank node" (not an actual blank node).

kstapelfeldt commented 3 years ago

Hey Rosie - that's great to know. I just scrolled to the bottom and found a pull request, and made an assumption. I'll remove it from my google issues org so that it doesn't come up as part of what I thought might be the 'easy' cleanup.