blunalucero / MODS-RDF

MODS RDF is an RDF ontology for MODS. As MODS is an XML schema for a bibliographic element set, MODS RDF is an expression of that element set in RDF.
7 stars 4 forks source link

Authorities: Should MODS RDF adopt the BIBFRAME approach (abstraction layer)? #5

Open melanieWacker opened 10 years ago

melanieWacker commented 10 years ago

See MODS RDF Ontology: Authorities Ray Denenberg, Library of Congress November 6, 2013 for reference

melanieWacker commented 10 years ago

Related paper: On BIBFRAME Authority http://bibframe.org/documentation/bibframe-authority/

melanieWacker commented 10 years ago

Working group call 1/29/14: Ray volunteered to work on this topic with Jeff. Rebecca is also interested.

melanieWacker commented 10 years ago

Notes from 1.29.14 call: (RD) Bibframe mandates an abstraction layer, MODS RDF should not be so strict. Comment (RD) 2/10/14: BIBFRAME abstraction layer decision seems to be in a state of flux at the moment. RD and RG will investigate further. RD and RG will advise on bibframe approach with JM. (RD) Topics 13, 5, and 1 all related to this one.

melanieWacker commented 10 years ago

See notes from 3/24/2014 call for discussion on BIBFRAME approach: https://github.com/blunalucero/MODS-RDF/wiki/MODS-RDF-Working-Group-Call-3.24.14

melanieWacker commented 10 years ago

following up on our last working group call, Jeff Mixter developed a document illustrating some of the issues involved between the indirect approach (currently proposed for BIBFRAME) and the direct approach. See https://github.com/blunalucero/MODS-RDF/wiki/Authorities-in-MODS-RDF:-Direct-vs.-Indirect

raydAtLC commented 10 years ago

Forgot to mention a couple points.

· Draft 2 of the Authorities spec will correct the perception that blank nodes are required. It was not the intention, but we didn’t supply any examples of re-usable resources. We will include such an example in draft 2.

· We can relax the restriction against multiple labels.

Ray

From: melanieWacker [mailto:notifications@github.com] Sent: Tuesday, April 22, 2014 5:11 PM To: blunalucero/MODS-RDF Cc: raydAtLC Subject: Re: [MODS-RDF] Authorities: Should MODS RDF adopt the BIBFRAME approach (abstraction layer)? (#5)

following up on our last working group call, Jeff Mixter developed a document illustrating some of the issues involved between the indirect approach (currently proposed for BIBFRAME) and the direct approach. See https://github.com/blunalucero/MODS-RDF/wiki/Authorities-in-MODS-RDF:-Direct-vs.-Indirect

— Reply to this email directly or view it on GitHub https://github.com/blunalucero/MODS-RDF/issues/5#issuecomment-41096249 . https://github.com/notifications/beacon/4854536__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcxMzgyMDI3NiwiZGF0YSI6eyJpZCI6MjMxNTY4MDV9fQ==--5623b02643b693b83b1d9cf1dcbe51f5e9187e11.gif

infomnivore commented 10 years ago

Just wanted to ensure Ray's contribution to this thread got posted here as well:

I believe that there is a straightforward solution to the problem that Jeff raises.

Consider the following:

<!— BIBFRAME Authority  -->

<bf:Person>

    <bf:authorizedAccessPoint> Huxley, Aldous, 1894-1963</bf:authorizedAccessPoint>

    <bf:hasAuthority rdf:resource= "http://viaf.org/viaf/71392434”/>

</bf:Person>

Jeff's complaint is that the AAP is invalid because VIAF doesn't have AAPs, and the AAP in a BIBFRAME Authority is supposed to be extracted from the primary authority, in this case a VIAF.

So suppose instead we have:

<!— BIBFRAME Authority  -->

<bf:Person>

    <bf:label> Huxley, Aldous, 1894-1963</bf:label>

    <bf:hasAuthority rdf:resource= "http://viaf.org/viaf/71392434”/>

</bf:Person>

The problem goes away. But a new problem arises. The authority spec says:

"A BIBFRAME Authority … includes:

An AAP or Label. It should include one or the other …. . If a primary authority is supplied, then an AAP should be supplied."

That problem would go away if we simply add “if possible” at the end. Right? We can do that.

Ray

infomnivore commented 10 years ago

While Ray suggestions above would open the door to having VIAF validate as an authority in MODS-RDF, I think Jeff raised some deeper philosophical and technical issues that bear further discussion.

The three that jump out at me at the moment are: (1) As Jeff notes, generating URIs would be better than relying on strings; similarly, it would be better if, rather than generating blank nodes, we systematically converted them into Skolem IRIs (per http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/) since this would facilitate future work to put the data on the Web or otherwise make it available for use by applications that consume linked data.

(2) per Jeff:

Since the bf:Person is a sub-class of bf:Agent which itself is a sub-class of bf:Authority, a reasoner would infer that “Aldous Huxley” is an Authority. He is NOT, he is a Person. This is particularly problematic when using VIAF URIs because if they are dereferenced in a browser it will indicate that the thing being described is a foaf:Person (i.e. a real-world entity). To this end, VIAF is not an authority reference.

In other words, a reasoner (or a person) trying to move up and down the chain here is going to be confused by the way we've modeled these concepts. Clearly "Authority" should not be a parent of Agent: that's the weak link here, IMHO. On the other hand, by reconceiving the AuthorizedAccessPoint as a repeatable sub-class of label, and assigning it based on the appearance of that label in a recognized authority system, then you can retain the Libraryland concept of authority without losing the mojo that comes from making the data conform with expectations out in LODland.

At least, that's my take on how to address this.

(3) again, per Jeff:

The current model seems to imply that creators have to come from an existing authority file. In principle this seems to make sense but it simply does not work in the real world. People should be able to use non-Authority URIs to describe things (i.e. from Freebase, DBpedia, GeoNames, VIAF, etc.)

I'm not sure if reversing the decision to make Authority the parent of Agent obviates this issue: if it does, great; if not, then it needs to be specified that creators can come from anywhere (or, at least, that they can come from any authority system and/or any LOD system such as those Jeff names).

Another possibility would be to flatten our distinction between authority systems and other LOD (something Jeff hints at when he says "In Linked Data a 'authorized access point' would be synonymous with a URI"), by recognizing DBpedia URIs (for example) as AuthorizedAccessPoints, although perhaps that waters down a valuable distinction between traditional bibliographic authorities and these new systems.

melanieWacker commented 10 years ago

To point (2) in the posting above: There is a lot of discussion right now in Libraryland centering on the relationship between researcher identifiers and authority files. For example, several institutions are talking about implementing (or have implemented) ORCID, which uses foaf:Person, so we will come up against this issue frequently. Having an approach that is recognized by and workable for both communities will be critical.

kefo commented 10 years ago

I'm mostly lurking, but I wanted to add a few notes about and to this discussion.

(1) As Jeff notes, generating URIs would be better than relying on strings; similarly, it would be better if, rather than generating blank nodes,

I read that and then I looked at Jeff's document and I'm still somewhat perplexed as to why there is so much discussion about blank nodes. If (and I need to look this up), in Ray's document, he said or even suggested blank nodes were mandatory, we'll have to get that changed. I hesitate to speak for anyone so let me just say that I certainly do not encourage the use of blank nodes, but they do need to be accepted as a reality. So, with respect to the quoted text above, the best you can do is advise not to use blank nodes, but I do not think you can actually forbid it (and, if you can technically forbid it, you shouldn't because that will be a problem at some point, especially to adoption).

I also do not understand this (from Jeff's notes):

This model still seems to rely primarily on strings to identify things.

About this (quoting Jeff):

Since the bf:Person is a sub-class of bf:Agent which itself is a sub-class of bf:Authority, a reasoner would infer that “Aldous Huxley” is an Authority.

Yep, a reasoner would come to that conclusion. How many here have turned a reasoner on smartly modelled bibliographic data? I have and I have to say I was underwhelmed. It basically tells you things you already knew and had long ago decided weren't too terribly important. Those things, however, you did care about were so lost in a tsunami of inference-created data as to be relatively worthless. For example, on turning an inference engine against a pile of BIBFRAME data we would indeed learn that '“Aldous Huxley” is an Authority,' but I find myself asking "so what?". Now, maybe - maybe - some day a really big computer will seize up and fry its circuit board upon determining this "fact" during inferencing, but I am doubtful.

In my experience, the richest inferences we'll find in our data stem from the relationships between bibliographic resources, but even then it will take a very carefully constructed ontology to really exploit those. Bib data is not like People data. If I have an uncle and I have a father and they both have the same mother, then we can infer they are brothers. When it comes to those types of familial relationships from which you can infer additional information, that example is just the tip of the iceberg. But, to my knowledge, no one has ever sat down and tried to look at relationships between bib resources to see whether they can be exploited in quite the same way, with equally rich inferences. And, because no one has really done this, the information you will get out of an inference engine run against bib data will be lackluster. I've found it more efficient to pick and choose the inferences I want or need and extract them via programming versus pinning my hopes on a reasoner. This is also why things like SWRL(http://en.wikipedia.org/wiki/SWRL) have been proposed, though I've not looked into this recently (that and reasoners are often limited to only that which can be "concluded" and sometimes you know what you want from the data).

One of the things I find so interesting about this issue is how, at the heart of it, a bf:Authority is a form of abstraction, and one that is meant to encapsulate a modelling pattern. bf:Authority, however, is often treated and seen as something much, much bigger. For example, bf:Agent is also an abstraction, as is foaf:Agent or dcterms:Agent, but those abstractions do not elicit the same reaction as bf:Authority does, yet no one is expected to actually use foaf:Agent. FOAF implementers use Person, Group, or Organization, but foaf:Agent exists because it neatly captures a design pattern about those other things.

The other aspect I find so fascinating is the reaction people have about the choice of "Authority" as the Class name. Their reaction is, ironically, for precisely the reason the name "Authority" was chosen. The name is indeed designed to elevate the Thing's importance but the message is not really meant for the professional library data modellers in the world. It's meant to elevate the Thing in the eyes of downstream librarians, libraries, and users (if the latter ever actually see "bf:Authority" in the data). It's meant to communicate "Hey, we [libraries] are authoritative about this Thing and we have authoritative information about it." Which raises another point: definitionally, we've gone out of our way to disassociate a bf:Authority with "traditional library authority efforts," or whatever the phrase was, yet everyone persists in seeing it this way. (And, yes, maybe that in and of itself is a message, but you come up with a name that encapsulates the meaning in the quote above and does not include the word "local," or a synonym).

In short, bf:Authority is a modelling pattern meant to say more about its use to downstream users; it is not mean to speak to model designers and inference engines.

From Jeff:

The current model seems to imply that creators have to come from an existing authority file.

How implementers choose to use bf:hasAuthority is their own business, meaning that, if an implementer uses vocabulary/dataset XYZ as its source, then to that implementer vocabulary/dataset XYZ is the implementer's authoritative source. That said, here's what I do not fully understand: If VIAF is setting itself up as an authoritative point of information about Things - such as People and Places - how is it not acting as a type of a authoritative source? Is VIAF not promoting its URIs as the URIs everyone should be using? And, if it is, then why is it doing all that work if not to be the authoritative source for information about those Things. Or, based on the SPARQL pattern Jeff included at the end of his notes, would I better think of VIAF as a simple clearinghouse for a label?

Also, a general note about the direct/indirect business: The Direct approach makes a whole ton of sense if you are an organization that hosts or supports a large vocabulary. In many ways, it would make more sense for LC to adopt the direct approach since we do make very heavy use of the LC/NACO file and we have ID.LOC.GOV. This, I suspect, is also OCLC's view on the matter, since it supports VIAF and FAST. But in a bigger world, it just becomes unrealistic. Not every name (or every Person, however you want to look at these things) is in a published vocabulary. Since one of the aims behind BIBFRAME is to facilitate data exchange, the Abstraction Layer - the bf:Authority - provides one single way to communicate this information, whereas supporting both approaches potentially introduces a pretty big variable into the data exchange mechanism.

Additionally - and this is important - the Abstraction Layer approach was taken because people wanted to also be able to augment the data. If a bf:Authority has a local URI, then, locally, the library can say anything it wants to about the resource without risk of polluting the bigger LOD cloud. For example, with a bf:Authority approach, one could do this:

ex:1234   bf:creator  ex:abcd

ex:abcd   rdf:type bf:Person
ex:abcd   bf:note  "This person came here once in 1965.  It was cool."
ex:abcd   bf:hasAuthority viaf:abcd

If a VIAF URI were used, this happens:

ex:1234   bf:creator  viaf:abcd
viaf:abcd  bf:note  "This person came here once in 1965.  It was cool."

The world won't end if the second example gets into the wild, but it could get messy (imagine 50,000 libraries making crazy statements like the above about all types of things and assigning them to VIAF resources, and all of those those getting into the wild).

One of the things I would encourage this group to do is experiment with the abstraction layer versus the direct approach and talk to others. I was delighted to listen to Phil Schreur speak at ALA Midwinter about Stanford's experience with this very issue. He stood up and started talking about how unhappy he was about this whole Abstraction Layer thing. Until, that is, Stanford started looking at its data and realized the "Direct Approach" was problematic on a number of fronts. FWIW - and there are people here at LC and at Zepheira who'll vouch for this - I didn't like the Abstraction Layer when I first encountered it and I made a stink about it. Then, I started dealing with the data....

[Which makes me think of an aside: as you all develop MODS/RDF, write code as you go. Work on a transformation of MODS/XML data to MODS/RDF versus talking in the abstract. Trust me, it will surface a lot of things you'll have likely not considered or would consider working purely in the abstract. It's not enough to create examples. Take your MODS data and see how many "hits" on names you get with VIAF or ID.LOC.GOV. Perhaps you found a lot, which is all well and good, but what do you do with the ones that weren't found? Are you sure you got the right match, the right Person? Also, do you need to support the entire VIAF dataset - or the ability to query VIAF real time - in order to use the information you need for search and display?]

Finally, two things. 1) A number of points Jeff makes in his notes build on themselves and I think some are predicated on a misunderstanding of, or assumptions about, what the BIBFRAME model/vocabulary is trying to do or say (or, importantly, what it does not say, in fact). One of these I've already mentioned: that for some reason bnodes are required. Another is the notion that people are going to use, and want to use, reasoners widely and, because people are going to do this, then encountering bf:Authority will be problematic. I'm just doubtful of this, but, my doubt aside, the BIBFRAME vocabulary has been mostly underspecified so a reasoner will in fact reveal very little worth anything. Lastly, a number of Jeff's assumptions/conclusions - including those he presents before he talks about the "direct approach," which he prefers - are predicated on the direct approach. So, this:

bf:creator rdfs:range  bf:Authority
ex:1234   bf:creator   viaf:1234567890

would lead to these inferences:

viaf:1234567890  rdf:type  bf:Authority
viaf:1234567890  rdf:type  foaf:Person
bf:Authority sameAs foaf:Person

(Usually, but not always, computers will self-destruct when they encounter those last three statements. I'm joking, sheesh. :) )

But we've actually expressly adopted the indirect approach which means that bf:creator should have never be made! I'll readily concede that an inference engine would conclude the second set of triples, but we literally said that you should not do the first. You can't count the second among the problems with the vocabulary if you should have never done it in the first place.

and 2):

Thank you, Jeff, for posting the notes. I do think it is very good to have these discussions and I look forward to more of them. Please - oh please - do not construe any of the above as anything more than commentary on your thoughtful contribution to this discussion. I haven't closed the door to anything at all, and I'm really not trying to, but I've not been convinced yet of the merits of supporting only the Direct Approach or supporting both approaches simultaneously while trying to create a workable exchange mechanism to replace MARC (which is what BIBFRAME is doing; I'm not saying MODS/RDF is doing this or that it should be doing such a thing).

The parenthetical reminds me: most of everything I have commented on has been about BIBFRAME. I harbor no assumption that MODS/RDF should adopt the BF approach to these things, but I did want to offer some BIBFRAME clarifications.

raydAtLC commented 10 years ago

I think the main stumbling block with authorities has been that the viaf proponents do not think that a viaf resource can be the object of bf:has Authority, because of the clash between string and real world object.

I believe that the recent discussion on the BIBFRAME list might help change that view. Kevin Ford has proposed a new defintional approach.

For example: “An Authority represents a key Concept or Thing. Works and Instances, for example, have defined relationships /to/ these important Concepts and Things.”

And for bf:Person: A person, whether real, fictional, etc.

Not to say that this is settled yet, but it seemed that these revised (draft) definitions were well received in the discussion and go a long way to achieving resolution in the clash over whether these things are strings or real world objects.

raydAtLC commented 9 years ago

Note the BIBFRAME definition changes.

bf:Authority: A BIBFRAME Authority represents a key Concept or Thing. Works and Instances, for example, have defined relationships to these Concepts and Things.

bf:Person: Individual or identity established by an individual (either alone or in collaboration with one or more other individuals)