Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

Pivot to using Twig to get JSONLD instead of relying on the RDF module #1633

Open dannylamb opened 4 years ago

dannylamb commented 4 years ago

We do a lot of work to get around the limitations of the rdf module. That plus no one seems to like editing the yml by hand.

Would using Twig templates to render out JSONLD be preferable here? The templates will have to live in a module, but would allow us some flexibility. Like being able to do things such bury metadata that's about the thing (and not the web content) under a # fragment. We could also preprocess the template to jam in the members and media views and would have that info available as well.

I'm curious how folks feel about that approach more than anything else.

mjordan commented 4 years ago

@dannylamb this is a great idea.

seth-shaw-unlv commented 4 years ago

I'm not sure...

Yes, the RDF module has serious limitations that need to be addressed, but I think the general approach of 1) defining a mapping in config, 2) building the JSON-LD array structure based on the mapping, and 3) stringifying it is a sound strategy.

The main limitations we have are 1) a limited mapping capability (forcing us to rely on alter hooks) and 2) no UI for updating the config (forcing us to use Features or the core config editing capabilities).

I don't know that using Twig templates gains us anything on those two fronts. First, while we have a lot of TWIG templates to build off of to create HTML pages, but I'm having a hard time imaging the twig templates necessary to generate JSON-LD. I'm sure you could do it... but I'd much rather take an array and call json_encode. Second, mapping logic will still have to happen in code, just in template preprocessing functions. This means we lose the existing mapping configuration we already have and will probably still need to end up using a bunch of template preprocess calls for customizing things that aren't much different from the existing alter we already have. This strikes me as more difficult to customize for most users. (I suppose you could put the mapping logic into the template, but that is going to get ugly, and will that be any easier for site admins than updating a config file?)

seth-shaw-unlv commented 4 years ago

I should also note that our recent work on converters shows that we may have an effective way of overcoming the RDF mapping short-comings without alters that puts more of the power in the hands of config editors.

DiegoPino commented 4 years ago

This is VERY important (and surprising) for us (Metro, Archipelago community @giancarlobi @alliomeria) and intersects our hard work (on talking speaking and teaching community about seeing metadata schemas as a flexible thing) and wonder if this means you are planning going the Archipelago way? (but not stating it as such) Twig template as the only source of Metadata casting, exposure is core to our architecture/system since 2018 and we have devoted a lot of engineering to make that happen, including endpoints, API exposure, extensions, caching, custom entities, etc. @seth-shaw-unlv has probably not seen this but yes, JSON-LD, IIIF, Schema.org, GEOJSON, MODS,DC etc are all generated that way in our case/code. And this goes way longer back from IMI...and even from times where Islandora 8 was silex microservices.

@mjordan @dannylamb if you plan to make such a big architectural change in Islandora 9 and overlap that much with our approach we would love that at least it is publicly acknowledged as such. Is that the case?

alxp commented 4 years ago

An alternate method I think ew can consider is to make use of the Metatag config interface, with its. plugin system for defining Groups and Tags.

An example, from my Schema.org Dataset module on Drupal:

You can define a field with just PHPDoc annotation, and inherit class methods only if you need to do something special with the output:


/**
 * Provides a plugin for the 'schema_dataset_contributor' meta tag.
 *
 * - 'id' should be a globally unique id.
 * - 'name' should match the Schema.org element name.
 * - 'group' should match the id of the group that defines the Schema.org type.
 *
 * @MetatagTag(
 *   id = "schema_dataset_contributor",
 *   label = @Translation("contributor"),
 *   description = @Translation("Contributor to the dataset"),
 *   name = "contributor",
 *   group = "schema_dataset",
 *   weight = 1,
 *   type = "string",
 *   secure = FALSE,
 *   multiple = TRUE
 * )
 */
class SchemaDatasetContributor extends SchemaPersonOrgBase {

}

While schema.org Metatag intervenes in the header generation to output the JSON-LD instead of head tags, we could intervene one time further to take some subset of JSON-LD to put it elsewhere.

In our RDM site we then export a config to specify which field to get the contributor name from, in our case it reaches in to a paragraph:

[node:field_rdm_contributors:entity:field_rdm_person:entity:field_rdm_personal_name]

The Metatag config UI can also have custom form elements

Screen Shot 2020-09-30 at 3 59 40 PM

mjordan commented 4 years ago

@DiegoPino AFAIK this question is about serializing RDF only, we're not planning to move away from Drupal fields as the canonical home for data. Had a good discussion at today's tech call but we opened up more questions that we answered. Thanks for reaching out. We of course look forward to opportunities to collaborate with the Archipelago community. Not sure yet how/if this is one of them since at this point we're just questioning our current way of serializing field data into RDF.

dannylamb commented 4 years ago

@alxp Is it possible to build a separate form than the one that the schema.org module provides using your technique?

alxp commented 4 years ago

@dannylamb The Metatag config form is generated by gathering up all plugins that declare they are a @MetatagGroup( in their plugin class annotation, and those classes extend MetatagGroup class. So if we wanted to have Islandora-specific Group(s), those could certainly live in their own page, and just get groups of a certain type or maybe we declare a separate group plugin type.

dannylamb commented 4 years ago

I mean, if we can provide a way for someone to enable a module and now they have a form they can fill out to map fields to rdf with tokens, that certainly sounds like a nicer experience, albeit a bit more restrictive than what we have now.

Being able to let people have whatever fields they want but map them to a consistent dcterms representation would be pretty awesome.

elizoller commented 4 years ago

Personally I don't think tokens would be flexible enough at all. There are often cases where we need to enforce additional logic to achieve the desired RDF mapping from the Drupal fields.

dannylamb commented 4 years ago

I don't think @alxp 's approach can't have other logic involved. You can alter the form generation / submit process in all the plugins. And you could brick 'em together into more complicated structures. At least I think.

I know you're doing paragraphs and custom callbacks @elizoller, do you have an example of that? I'm curious.

I'm open to pretty much anything that would let us do something like nest the descriptive metadata about the thing in a # and keep modified/created date predicates (and other stuff about the web content itself) on the main URI. Or would let you build up JSONLD that keeps a whole named graph about the node and the media and the files. That type of thing.

alxp commented 4 years ago

RDF YAML files put a lot of the logic of data transformation, like dates, into YAML entries. This approach is similar to the Migrate approach, where you are basically doing programming, but YAML is your DSL.

On the other hand, the Metatag / Schema.org http://schema.org/ Metatag approach puts more of this logic into plugins, which can be selected e.g. from drop-downs in the contexts where they are needed.

Having gone through the process of both RDF YAML file creation and creating a Migrate script, I don’t think we will have much luck helping non-programmers get in to customizing things this way. But offering an in-context plugin that does a date conversion from a drop down and then having the user enter a token path is a little closer to things you see elsewhere when working as a Drupal site builder.

On Oct 5, 2020, at 11:28 AM, dannylamb notifications@github.com wrote:

I don't think @alxp https://github.com/alxp 's approach can't have other logic involved. You can alter the form generation / submit process in all the plugins. And you could brick 'em together into more complicated structures. At least I think.

I know you're doing paragraphs and custom callbacks @elizoller https://github.com/elizoller, do you have an example of that? I'm curious.

I'm open to pretty much anything that would let us do something like nest the descriptive metadata about the thing in a # and keep modified/created date predicates (and other stuff about the web content itself) on the main URI. Or would let you build up JSONLD that keeps a whole named graph about the node and the media and the files. That type of thing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Islandora/documentation/issues/1633#issuecomment-703668874, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUD3BMMKJCCCGTBM3AT6TSJHJYJANCNFSM4R7HFZIQ.

elizoller commented 4 years ago

Here's what I've come up with. I'm sure its not perfect, but it's what we've got for now. It has three parts: RDF Mapping, callbacks (referenced in the mapping), and the json_ld_alter hook

  1. RDF mapping: https://github.com/asulibraries/islandora-repo/blob/develop/config/sync/rdf.mapping.node.asu_repository_item.yml

    • There are quite a few special custom callbacks referenced there
  2. Special callbacks: https://github.com/asulibraries/islandora-repo/tree/develop/web/modules/custom/asu_custom_rdf/src

    • This is where the custom callback classes live.
    • Example 1:
      uid:
      properties:
      - 'relators:dtc'
      mapping_type: property
      datatype_callback:
      callable: 'Drupal\asu_custom_rdf\UidLookup::username'

      That one takes the user ID and turns it into the username instead. Because our business case was to use the asurite (an ASU username for single sign on) instead of the Drupal ID number of the user.

    • Example 2:
      field_note_para:
      properties:
      - 'mods:note'
      - 'dcterms:description'
      datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParagraphMapping::singlefield'
      arguments:
      - field_note_text

      This might seem a little weird but basically what we have here is a paragraph field (called field_note_para) on the asu repository item. It has two fields in it one for type and one for the text of the note (field_note_text). In this case, for the RDF mapping, we decided not to separately represent the notes by type. So this one basically takes the argument for the field you want to represent and then the singleField method just gets the value from the field (and essentially ignores the rest of the paragraph).

    • Example 3:
      field_open_access:
      properties:
      - 'dcterms:accessRights'
      datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParseBoolean::tostring'
      arguments:
      1: 'open access'

      Here we have a boolean field field_open_access and if the value is 1, we want to set the value of the dcterms:accessRights to the string 'open access'.

    • Example 4:
      status:
      properties:
      - 'asu:visibility'
      datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParseBoolean::tostring'
      arguments:
      - unpublished
      - published

      Similar to above with the boolean field_open_access but we're mapping both 0 and 1 values to strings.

    • Example 5:
      field_title:
      properties:
      - 'dcterms:title'
      - 'mods:title'
      datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParagraphMapping::titlepartmerge'
      arguments:
      nonsort: field_nonsort
      main: field_main_title
      subtitle: field_subtitle

      This one is kind of fun. We have a paragraph title field (field_title) that contains several subfields including field_nonsort, field_main_title, and field_subtitle. The custom callback gets each of the field values and merges them into a single string.

    • Example 6:
      field_typed_identifier:
      properties:
      - 'dcterms:identifier'
      - 'mods:identifier'
      datatype_callback:
      callable: 'Drupal\asu_custom_rdf\ParagraphMapping::typedmap'
      arguments:
      type_field: field_identifier_type
      type_taxonomy_field: field_identifier_predicate
      value_field: field_identifier_value
      predicate: identifiers

      Another fun one. We have a paragraph field (field_typed_identifier). This is similar to the typed note field I mentioned above. But basically the paragraph contains a type field (field_identifier_type) and a value field (field_identifier_value). The type field is a taxonomy reference. This one has a simple value mapping from the value_field provided, but is significantly changed in the json_ld_alter_hook here: https://github.com/asulibraries/islandora-repo/blob/develop/web/modules/custom/asu_custom_rdf/asu_custom_rdf.module#L92 It is actually getting the taxonomy term from the type_field and then the predicate itself from the taxonomy term (field_identifier_predicate) to get the identifier type mapping. We're using id.loc.gov identifier types for this, with the goal of producing rdf like uri identifiers:doi doigoeshere. We went this way in attempt to follow what modsrdf recommends here: https://www.loc.gov/standards/mods/modsrdf/v1/#identifier

  3. json ld alter hook: https://github.com/asulibraries/islandora-repo/blob/develop/web/modules/custom/asu_custom_rdf/asu_custom_rdf.module

    • The primary purpose of that hook is to kind of clean up the RDF. For example, if you change a id field to a string, you have to remove the '@id'. Or if you change a boolean to a string, you have to add the @language information.
seth-shaw-unlv commented 4 years ago

RDF YAML files put a lot of the logic of data transformation, like dates, into YAML entries. This approach is similar to the Migrate approach, where you are basically doing programming, but YAML is your DSL.

Not as much as you might think. The Migrate API does allow you to put a lot of logic in the YAML via temp fields and chained processing plugins, but the RDF mapping + conversion classes doesn't. You simply get the option to pass a field value to a single converter (plus give it some static options if supported by the converter). That is it. All the logic needs to live in those converters, just like the Schema/Metatag logic lives in the plugins. (Addendum: plus logic we've been tossing into the JSON-LD alters.) As for the user experience, I agree that forcing users to edit YAML isn't ideal, but we could build a Form interface for editing these YAML files with a drop-down of available converters.

That isn't to say I'm planting a flag in this strategy; I'm interested to see where this goes and open to options.