Open bryjbrown opened 6 years ago
itle (Goal) | Manage local authorities within Drupal |
---|---|
Primary Actor | Repository admin |
Scope | Metadata |
Level | High |
Story | As a repository admin, I want to be able to manage local authority records (asset-less purely informational digital objects) in Drupal. For example, I want to be able to create and update information about faculty members & students who may be authors of works in the repository. |
Further info: as discussed on the 3/7 CLAW call, this would basically entail creating Drupal content types for the different types of authorities we would like to manage, and deciding on the fields and mappings for each one. Once they are in Drupal, they can be indexed by Solr and autocompleted from forms.
This is essentially a CLAW version of the 7.x Entities SP, but we need to think of a better name for this one. Entities are already a thing in Drupal. "Authorities" might work, but that name might sound too specialized even though its the most accurate since they are records that don't represent an "item".
Examples of authorities that could be included for management in CLAW include (but are not restricted to):
As you are generating these entities to have a URI for your linked data. Do these items need to be synced to Fedora to?
@whikloj Yes, they should have a URI and a mapping so that they can be turned into RDF and pushed into Fedora & the triplestore.
@bryjbrown when you say asset-less, do you mean that they could only have RDF properties, or would they need any "notes" or anything beyond what you could express as RDF statemements? What I'm thinking of here is the kinds of information that the Entities SP provides in its default metadata form.
Not to increase the scope of this use case, but given Drupal 8's RDF- and API-first design and capabilities, would it make sense to decouple the management of authorities in D8 completely from Fedora. In other words, maintaining authorities would not have Islandora as a dependency. Any custom bundles/other Drupal plumbing could coexist alongside (i.e., in the same Drupal instance) as Islandora, but could also exist in an entirely separate Drupal but still be used by Islandora.
Maybe consider the ArchivesSpace data model, since it takes into account people and corporate entities. I imagine this would have to include multi-value fields, since there can be both ambiguity in name forms, authorized name forms, and display names.
Does this discussion take into account subjects, too? Or is that in another thread?
A couple of us pulled together a generic authorities client (suitable for name forms and subjects) a couple of years ago that queried LC, VIAF, other sources, then parsed and piped data into custom Drupal entities. Don't mind the cowboy code:
https://github.com/dramonline/authorities
Obviously needs a lot of work but could be made to co-exist and produce various metadata for Islandora. Not much skin on the bone there but it's a start.
@mjordan The only thing that would tie any type of content modeling to the core islandora
module is if you want the RDF sent to Fedora and the triplestore. And it's not really that hard of a dependency either since it's all config which could be exported separately.
@jasloe Cool. That's a nice start. Thanks for bringing it to our attention.
I see you've already stubbed out autocomplete, which is going to be fun :man_shrugging:. BTW the Steely Dan reference just made my morning.
⬛️🐮
@mjordan @dannylamb I suppose that theres nothing saying that the data HAS to live in one place or the other, but I don't see why any of the data describing the authorities WOULDN'T be expressible in RDF. The data should definitely be available via triplestore so we can query it via SPARQL, and even though that doesn't necessarily meant that it would have to have a Drupal side interface, I think having the ability to edit the RDF data through a GUI thats on the same system as our other assets would be a boon to repository administrators.
And to answer @jasloe's question about subjects, if my understanding is correct subjects are usually controlled vocabularies, and there isn't an existing use case for managing these AFAIK but perhaps there should be. Managing controlled vocabs in Drupal via taxonomies is something we talked about at the last CLAW call and something I've been looking into myself to see if its viable. I'll share what I've learned at the next CLAW call.
Please keep in mind that there are lots of ways to store RDF short of a full-on triplestore that can make it useful for services like autocomplete. I used Apache Stanbol's Solr-backed EntityHub component years ago for just that purpose. It offered blazing fast term search and LDPath queries and that's all I needed to provide LCSH autocomplete in some forms.
I don't know that I would push Stanbol at anyone right now (it's a moribund project, development-wise) but I'm just trying to make the point that RDF can go into lots of different stores, and if autocomplete/suggestion/etc. is the functional need, a triplestore might be a little too much pork for the fork. A simpler index might be more performant for less work and management, especially given the kinds of "search for terms beginning with this prefix" kinds of searches that autocomplete implies.
@acoburn I think the current plan is to have Drupal nodes (entities?) in Drupal, but push info to the triplestore for consumption by other applications. Having a copy of that info in Drupal should expose it to Solr allowing for autocompletes.
@bryjbrown I get that-- I'm warning that without actual use cases for those "other applications" you're going to put a lot of data into the triplestore speculatively. LCSH is probably 9M triples alone.
I 'm not saying you can't do that (and certainly SPARQL is ultimately more flexible than almost any other exposure), I'm saying that making CLAW publish SPARQL endpoints for large vocabularies is a different functionality than making CLAW consume large vocabularies for autocomplete and should be considered as such (given different config, for example).
@bryjbrown my 2 cents. I have an idea in my head, and maybe we could explore that idea with an example we can share back to the community. I like the idea of Drupal Taxonomies. (my idea is to use that)
I think we can alter taxonomies a bit to make use of their more "lightweight" nature, compared to other content entities, and allowing them to play well with RDF. I mentioned already SKOS as a way of wrapping multiple external authorities to a single local URI and I feel that could also help in daily maintenance. Means you are pointing to a certain Authority record (and you have like 200000 nodes already using that URI), if you need to change a deprecated authority URI or add new non-local ones you have a single taxonomy term to update: https://www.drupal.org/docs/7/organizing-content-with-taxonomies/using-taxonomy-urls-to-display-sets-of-content
Also, if using Solr integration, you can ask Drupal to index Taxonomies into its own Core (you can do that for every entity type). And about SPARQL, versus LD fragments, versus REST API, I mean, once you have that defined and indexed in your Drupal side by side and you can reuse it in your other nodes, where other places you allow to interact (autocomplete, rest, Solr query) becomes no longer a data definition concern but more like a UI concern i.m.h.o. I'm also thinking about small repositories of course.
I found this and really liked it to show others how Getty, LoC, etc are empowering and reusing SKOS https://www.getty.edu/research/tools/vocabularies/7_itwg_kos_lod_udate_zeng.pdf
I'm not sure I understand :100: of what @DiegoPino's talking about, but we're definitely under-utilizing taxonomies. They're definitely applicable here and in a lot of other scenarios.
At LDCX there were a couple of sessions on Questioning Authority, which will aggregate authorities from a variety of sources and cache them locally. Even though it's used by Samvera, Samvera is not a dependency, so it could be used by Islandora to fetch authority data. I think managing local authorities is in the works, but for now, QA only deals with authorities harvested from LoC, etc. Might be worth learning more about.
That Questioning Authority looks like a perfect fit to allow us to not pre-load a bunch of authorities but get them on a as-needed basis.
intrigued
I may be missing something here, but isn't that a Ruby/Rails app?
@ajs6f yes but with an HTTP interface. See https://github.com/samvera/questioning_authority#examples.
QA doesn't mean that you'd bring in all of Samvera, but you will need some sort of Ruby-based runtime. That means either a full RoR application or at the very least Sinatra just to run this.
Edit: you need a full Rails application to run this.
@mjordan Yes, thanks, but that's not my concern at all. My concern is deployment footprint. I speak (of course) only for SI, but for us, Ruby is simply not on the table in any way shape or form. It's a hard blocker and that's unlikely to change within five years.
@acoburn Exactly, and that's a blocker for us.
I don't mean to discourage all interest in this tool-- I'm just pointing out that selecting a deployment tech that is outside the current CLAW footprint is necessarily going to drop some current potential adopters.
I'd much rather see CLAW written to some autocomplete/vocab management API with more than one implementation. Is there any alternative impl to QA with the same API? Is there a defined API for QA other than "what the current version of the impl does"?
For example, in the world of SKOS I think of SKOS Shuttle.
If QA were an entirely self-contained application (e.g. something that can be dockerized an run as an independent microservice) that would be more interesting, but as it is, you'd need to build and maintain a full rails application for this.
OK, good points.
@mjordan Maybe there's a fulcrum over which CLAW and Samvera can collaborate if an appropriate API can be adopted or defined.
if an appropriate API can be adopted or defined
LDP?
Bumping https://github.com/Islandora-CLAW/CLAW/issues/815#issuecomment-371988442. Currently works with LC, VIAF to mappable entity fields. Could add other endpoints.
@ajs6f @MarcusBarnes @mjordan @acoburn i spoke with some samvera implementers and still, hey ended using self managed skos concepts (we spoke about this some time ago). I would encourage here a mix of Apache Stanbol and Drupal Taxonomies + Solr 7 autocomplete handler , Ruby (and other dependencies prob) is not terrible to maintain but escapes many user's realities for sure.
Speaking from experience (as someone who built a Hydra-based RoR app several years ago -- one that uses QA, incidentally), I needed to update one line of that code last month. Testing and CI all passed but it all crashed horribly in deployment because the QA dependency now required a newer Ruby runtime that I had available (oops!) and my limited experience with Gemfiles led to some pretty horrible on-the-fly debugging. While this ended up being resolved, the very real issue I have encountered with QA in particular is the rapid speed with which it develops and, as a non-Ruby-focused shop, it is very difficult to keep up with these things while also trying to understand the magic that goes along with RoR apps. And so long as QA is a 'gem', you will need to manage the whole rails environment around it.
That said, the QA code itself is very simple. It's really not much more than a caching service. Reimplementing it in PHP would not be hard. Nor would just using an existing, stand-alone solution like Apache Stanbol.
@jasloe How about all of the Proteomics Standard Initiative ontologies? I'm not trying to be silly, those are real needs for us and I want to point out that any solution that isn't generic is going to have a lot of work loaded in the future.
Stanbol (really I think Stanbol EntityHub is what we mean?) is pretty dang strong stuff, but it's not very well maintained at this point. I'm not condemning it-- I'm more suggesting that if we think it might slot in nicely as a component in CLAW, we will want to think about CLAW's long-term relationship with that project.
EnitityHub essentially provides LDPath query over your choice of Solr or triplestore backend. I wonder if we can take a look at how it would actualy get used in CLAW and perhaps abstract over it a bit...?
Another meta-point-- we've got at least two issues in hand in this part of the discussion:
So for example, if we used Stanbol EntityHub directly, that would be the former, and if we reimpl'd QA in PHP and gave it the same HTTP API so that loader tools would work with either the Ruby or PHP version, that would be the latter.
I think both are great values to uphold, but they are different and we should be sure about which one we are trying to do and how.
So, given that the new content-model-overhaul separation between core functionality and demo content models with emphasis on local repositories defining their own models, does it make sense to keep this as an Islandora use case, or should there be a separate project/module to address this?
As a note, I have another module in early development that addresses this use case. It was my intention to have it support both the ArchivesSpace/Drupal 8 Integration project and my CLAW-based digital objects.