Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
103 stars 71 forks source link

Manage local authorities within Drupal #815

Open bryjbrown opened 6 years ago

bryjbrown commented 6 years ago
itle (Goal) Manage local authorities within Drupal
Primary Actor Repository admin
Scope Metadata
Level High
Story As a repository admin, I want to be able to manage local authority records (asset-less purely informational digital objects) in Drupal. For example, I want to be able to create and update information about faculty members & students who may be authors of works in the repository.
bryjbrown commented 6 years ago

Further info: as discussed on the 3/7 CLAW call, this would basically entail creating Drupal content types for the different types of authorities we would like to manage, and deciding on the fields and mappings for each one. Once they are in Drupal, they can be indexed by Solr and autocompleted from forms.

This is essentially a CLAW version of the 7.x Entities SP, but we need to think of a better name for this one. Entities are already a thing in Drupal. "Authorities" might work, but that name might sound too specialized even though its the most accurate since they are records that don't represent an "item".

Examples of authorities that could be included for management in CLAW include (but are not restricted to):

People

Organizations

Events

whikloj commented 6 years ago

As you are generating these entities to have a URI for your linked data. Do these items need to be synced to Fedora to?

bryjbrown commented 6 years ago

@whikloj Yes, they should have a URI and a mapping so that they can be turned into RDF and pushed into Fedora & the triplestore.

mjordan commented 6 years ago

@bryjbrown when you say asset-less, do you mean that they could only have RDF properties, or would they need any "notes" or anything beyond what you could express as RDF statemements? What I'm thinking of here is the kinds of information that the Entities SP provides in its default metadata form.

mjordan commented 6 years ago

Not to increase the scope of this use case, but given Drupal 8's RDF- and API-first design and capabilities, would it make sense to decouple the management of authorities in D8 completely from Fedora. In other words, maintaining authorities would not have Islandora as a dependency. Any custom bundles/other Drupal plumbing could coexist alongside (i.e., in the same Drupal instance) as Islandora, but could also exist in an entirely separate Drupal but still be used by Islandora.

jasloe commented 6 years ago

Maybe consider the ArchivesSpace data model, since it takes into account people and corporate entities. I imagine this would have to include multi-value fields, since there can be both ambiguity in name forms, authorized name forms, and display names.

Does this discussion take into account subjects, too? Or is that in another thread?

A couple of us pulled together a generic authorities client (suitable for name forms and subjects) a couple of years ago that queried LC, VIAF, other sources, then parsed and piped data into custom Drupal entities. Don't mind the cowboy code:

https://github.com/dramonline/authorities

Obviously needs a lot of work but could be made to co-exist and produce various metadata for Islandora. Not much skin on the bone there but it's a start.

dannylamb commented 6 years ago

@mjordan The only thing that would tie any type of content modeling to the core islandora module is if you want the RDF sent to Fedora and the triplestore. And it's not really that hard of a dependency either since it's all config which could be exported separately.

dannylamb commented 6 years ago

@jasloe Cool. That's a nice start. Thanks for bringing it to our attention.

I see you've already stubbed out autocomplete, which is going to be fun :man_shrugging:. BTW the Steely Dan reference just made my morning.

jasloe commented 6 years ago

⬛️🐮

bryjbrown commented 6 years ago

@mjordan @dannylamb I suppose that theres nothing saying that the data HAS to live in one place or the other, but I don't see why any of the data describing the authorities WOULDN'T be expressible in RDF. The data should definitely be available via triplestore so we can query it via SPARQL, and even though that doesn't necessarily meant that it would have to have a Drupal side interface, I think having the ability to edit the RDF data through a GUI thats on the same system as our other assets would be a boon to repository administrators.

And to answer @jasloe's question about subjects, if my understanding is correct subjects are usually controlled vocabularies, and there isn't an existing use case for managing these AFAIK but perhaps there should be. Managing controlled vocabs in Drupal via taxonomies is something we talked about at the last CLAW call and something I've been looking into myself to see if its viable. I'll share what I've learned at the next CLAW call.

ajs6f commented 6 years ago

Please keep in mind that there are lots of ways to store RDF short of a full-on triplestore that can make it useful for services like autocomplete. I used Apache Stanbol's Solr-backed EntityHub component years ago for just that purpose. It offered blazing fast term search and LDPath queries and that's all I needed to provide LCSH autocomplete in some forms.

I don't know that I would push Stanbol at anyone right now (it's a moribund project, development-wise) but I'm just trying to make the point that RDF can go into lots of different stores, and if autocomplete/suggestion/etc. is the functional need, a triplestore might be a little too much pork for the fork. A simpler index might be more performant for less work and management, especially given the kinds of "search for terms beginning with this prefix" kinds of searches that autocomplete implies.

bryjbrown commented 6 years ago

@acoburn I think the current plan is to have Drupal nodes (entities?) in Drupal, but push info to the triplestore for consumption by other applications. Having a copy of that info in Drupal should expose it to Solr allowing for autocompletes.

ajs6f commented 6 years ago

@bryjbrown I get that-- I'm warning that without actual use cases for those "other applications" you're going to put a lot of data into the triplestore speculatively. LCSH is probably 9M triples alone.

I 'm not saying you can't do that (and certainly SPARQL is ultimately more flexible than almost any other exposure), I'm saying that making CLAW publish SPARQL endpoints for large vocabularies is a different functionality than making CLAW consume large vocabularies for autocomplete and should be considered as such (given different config, for example).

DiegoPino commented 6 years ago

@bryjbrown my 2 cents. I have an idea in my head, and maybe we could explore that idea with an example we can share back to the community. I like the idea of Drupal Taxonomies. (my idea is to use that)

I think we can alter taxonomies a bit to make use of their more "lightweight" nature, compared to other content entities, and allowing them to play well with RDF. I mentioned already SKOS as a way of wrapping multiple external authorities to a single local URI and I feel that could also help in daily maintenance. Means you are pointing to a certain Authority record (and you have like 200000 nodes already using that URI), if you need to change a deprecated authority URI or add new non-local ones you have a single taxonomy term to update: https://www.drupal.org/docs/7/organizing-content-with-taxonomies/using-taxonomy-urls-to-display-sets-of-content

Also, if using Solr integration, you can ask Drupal to index Taxonomies into its own Core (you can do that for every entity type). And about SPARQL, versus LD fragments, versus REST API, I mean, once you have that defined and indexed in your Drupal side by side and you can reuse it in your other nodes, where other places you allow to interact (autocomplete, rest, Solr query) becomes no longer a data definition concern but more like a UI concern i.m.h.o. I'm also thinking about small repositories of course.

I found this and really liked it to show others how Getty, LoC, etc are empowering and reusing SKOS https://www.getty.edu/research/tools/vocabularies/7_itwg_kos_lod_udate_zeng.pdf

dannylamb commented 6 years ago

I'm not sure I understand :100: of what @DiegoPino's talking about, but we're definitely under-utilizing taxonomies. They're definitely applicable here and in a lot of other scenarios.

mjordan commented 6 years ago

At LDCX there were a couple of sessions on Questioning Authority, which will aggregate authorities from a variety of sources and cache them locally. Even though it's used by Samvera, Samvera is not a dependency, so it could be used by Islandora to fetch authority data. I think managing local authorities is in the works, but for now, QA only deals with authorities harvested from LoC, etc. Might be worth learning more about.

whikloj commented 6 years ago

That Questioning Authority looks like a perfect fit to allow us to not pre-load a bunch of authorities but get them on a as-needed basis.

dannylamb commented 6 years ago

intrigued

ajs6f commented 6 years ago

I may be missing something here, but isn't that a Ruby/Rails app?

mjordan commented 6 years ago

@ajs6f yes but with an HTTP interface. See https://github.com/samvera/questioning_authority#examples.

acoburn commented 6 years ago

QA doesn't mean that you'd bring in all of Samvera, but you will need some sort of Ruby-based runtime. That means either a full RoR application or at the very least Sinatra just to run this.

Edit: you need a full Rails application to run this.

ajs6f commented 6 years ago

@mjordan Yes, thanks, but that's not my concern at all. My concern is deployment footprint. I speak (of course) only for SI, but for us, Ruby is simply not on the table in any way shape or form. It's a hard blocker and that's unlikely to change within five years.

@acoburn Exactly, and that's a blocker for us.

I don't mean to discourage all interest in this tool-- I'm just pointing out that selecting a deployment tech that is outside the current CLAW footprint is necessarily going to drop some current potential adopters.

I'd much rather see CLAW written to some autocomplete/vocab management API with more than one implementation. Is there any alternative impl to QA with the same API? Is there a defined API for QA other than "what the current version of the impl does"?

For example, in the world of SKOS I think of SKOS Shuttle.

acoburn commented 6 years ago

If QA were an entirely self-contained application (e.g. something that can be dockerized an run as an independent microservice) that would be more interesting, but as it is, you'd need to build and maintain a full rails application for this.

mjordan commented 6 years ago

OK, good points.

ajs6f commented 6 years ago

@mjordan Maybe there's a fulcrum over which CLAW and Samvera can collaborate if an appropriate API can be adopted or defined.

acoburn commented 6 years ago

if an appropriate API can be adopted or defined

LDP?

jasloe commented 6 years ago

Bumping https://github.com/Islandora-CLAW/CLAW/issues/815#issuecomment-371988442. Currently works with LC, VIAF to mappable entity fields. Could add other endpoints.

DiegoPino commented 6 years ago

@ajs6f @MarcusBarnes @mjordan @acoburn i spoke with some samvera implementers and still, hey ended using self managed skos concepts (we spoke about this some time ago). I would encourage here a mix of Apache Stanbol and Drupal Taxonomies + Solr 7 autocomplete handler , Ruby (and other dependencies prob) is not terrible to maintain but escapes many user's realities for sure.

acoburn commented 6 years ago

Speaking from experience (as someone who built a Hydra-based RoR app several years ago -- one that uses QA, incidentally), I needed to update one line of that code last month. Testing and CI all passed but it all crashed horribly in deployment because the QA dependency now required a newer Ruby runtime that I had available (oops!) and my limited experience with Gemfiles led to some pretty horrible on-the-fly debugging. While this ended up being resolved, the very real issue I have encountered with QA in particular is the rapid speed with which it develops and, as a non-Ruby-focused shop, it is very difficult to keep up with these things while also trying to understand the magic that goes along with RoR apps. And so long as QA is a 'gem', you will need to manage the whole rails environment around it.

That said, the QA code itself is very simple. It's really not much more than a caching service. Reimplementing it in PHP would not be hard. Nor would just using an existing, stand-alone solution like Apache Stanbol.

ajs6f commented 6 years ago

@jasloe How about all of the Proteomics Standard Initiative ontologies? I'm not trying to be silly, those are real needs for us and I want to point out that any solution that isn't generic is going to have a lot of work loaded in the future.

ajs6f commented 6 years ago

Stanbol (really I think Stanbol EntityHub is what we mean?) is pretty dang strong stuff, but it's not very well maintained at this point. I'm not condemning it-- I'm more suggesting that if we think it might slot in nicely as a component in CLAW, we will want to think about CLAW's long-term relationship with that project.

EnitityHub essentially provides LDPath query over your choice of Solr or triplestore backend. I wonder if we can take a look at how it would actualy get used in CLAW and perhaps abstract over it a bit...?

ajs6f commented 6 years ago

Another meta-point-- we've got at least two issues in hand in this part of the discussion:

  1. the desirability of sharing implementation work with other communities (Samvera, Apache Stanbol) if and when possible, and
  2. the desirability of interoperating without sharing impls.

So for example, if we used Stanbol EntityHub directly, that would be the former, and if we reimpl'd QA in PHP and gave it the same HTTP API so that loader tools would work with either the Ruby or PHP version, that would be the latter.

I think both are great values to uphold, but they are different and we should be sure about which one we are trying to do and how.

seth-shaw-unlv commented 6 years ago

So, given that the new content-model-overhaul separation between core functionality and demo content models with emphasis on local repositories defining their own models, does it make sense to keep this as an Islandora use case, or should there be a separate project/module to address this?

As a note, I have another module in early development that addresses this use case. It was my intention to have it support both the ArchivesSpace/Drupal 8 Integration project and my CLAW-based digital objects.