blunalucero / MODS-RDF

MODS RDF is an RDF ontology for MODS. As MODS is an XML schema for a bibliographic element set, MODS RDF is an expression of that element set in RDF.
7 stars 4 forks source link

Vocabulary term as object of a triple: Should MODS RDF use the same property for expressing a vocabulary term as literal or URI? #6

Closed melanieWacker closed 9 years ago

melanieWacker commented 10 years ago

For background see: Development of a MODS RDF Ontology: Discussion Points Ray Denenberg, Library of Congress November 4, 2013 discussion point 2.1

mixterj commented 10 years ago

I think this is a great idea. I did something very similar when I was converting VRA/XML into RDF. Even though, in a perfect world, all of the languages (for example) would be coded using controlled vocabulary IDs, that is not always the case. We certainly do not want to throw out data. There is always the opportunity to run a mapReduce job after the fact and reconcile all of the languages that are coded using strings.

raydAtLC commented 10 years ago

The issue here is should the same property (name) be used when in one case the object is an rdf resource, and in another case it is a literal:

(1) ModsResource12356 language http://id.loc.gov/vocabulary/iso639-2/fra/

(2) ModsResource12356 language “french”

or should distinct properties be defined: (1) ModsResource12356 language http://id.loc.gov/vocabulary/iso639-2/fra/

(2) ModsResource12356 languageLiteral “french”

mixterj commented 10 years ago

I would argue that you should only have one property (preferably an object property). Other than the existence of poor legacy data, there is no reason why one would want to have both languageOfResource and languageLiteral properties. They mean the exact same thing with regards to how they are interpreted in relation to the item being described. This would be similar to having both a authorOfResource object property as well as a autherLiteral data property.

In order to accommodate for legacy data we could explore what Schema.org did and not set a domain and range on the property (and also not claim that it is a owl:ObjectProperty or owl:DataProperty). Or we could adjust for the variance in bib data by allowing the XSL stylesheet to check if a vocabulary is provided and if so coin a URI based on the data and it there is no vocabulary create a blank node. This is what I did in my VRA stylesheet.

raydAtLC commented 10 years ago

I strongly dislike the dual property approach too, but it is the approach that BIBFRAME is taking (at least, for now). The argument is: on one hand there is overhead and complexity for dual properties, but on the other hand there is the processing cost of not knowing what the object of a given property is going to be. And that the processing cost of the latter outweighs the complexity of the first. That's the argument anyway, I can't say that I support that argument.

From: Jeff Mixter [mailto:notifications@github.com] Sent: Tuesday, January 28, 2014 5:25 PM To: blunalucero/MODS-RDF Cc: Denenberg, Ray Subject: Re: [MODS-RDF] Vocabulary term as object of a triple: Should MODS RDF use the same property for expressing a vocabulary term as literal or URI? (#6)

I would argue that you should only have one property (preferably an object property). Other than the existence of poor legacy data, there is no reason why one would want to have both languageOfResource and languageLiteral properties. They mean the exact same thing with regards to how they are interpreted in relation to the item being described. This would be similar to having both a authorOfResource object property as well as a autherLiteral data property.

In order to accommodate for legacy data we could explore what Schema.org did and not set a domain and range on the property (and also not claim that it is a owl:ObjectProperty or owl:DataProperty). Or we could adjust for the variance in bib data by allowing the XSL stylesheet to check if a vocabulary is provided and if so coin a URI based on the data and it there is no vocabulary create a blank node. This is what I did in my VRA stylesheet.

— Reply to this email directly or view it on GitHub https://github.com/blunalucero/MODS-RDF/issues/6#issuecomment-33534200 . https://github.com/notifications/beacon/4854536__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcwNjQ4MDcyMywiZGF0YSI6eyJpZCI6MjMxNTc2MDB9fQ==--5278400e571bf420137bbb748aab8256de8a0910.gif

mjhan3 commented 10 years ago

I like the way Jeff did for the VRA stylesheet - allowing the XSL stylesheet to check if a vocabulary is provided - and if so, coin a URI based on the data, and if there is no vocabulary, create a blank node. However, I think we have to decide whether we will have one approach across all vocabulary terms, or will use different methods depends on the terms, for example geographic terms vs. language. I think there would be literal values from some legacy data that we want to keep as is in geographic terms, but not much so in language.

infomnivore commented 10 years ago

Since this issue is still open, I will add my voice to the chorus in favor of the single property approach. As to the issue MJ raises, I'm not sure I see why/how geographic terms would substantively differ from language terms: are you saying there are cases where you would want the literal preserved even if a vocabulary was given such that it could be swapped out for a URI? If so I think I need an example to wrap my head around...

melanieWacker commented 10 years ago

I am wondering if classification may be an example for such as case? The current MODS XML guidelines state "The DLF/Aquifer guidelines recommend that contain only classification numbers and call numbers whose authorities are referenced in Classification Scheme Source Codes maintained by the Library of Congress. It is left to the institution's discretion whether to truncate assigned call numbers to just the formal classification segment (for example, QA76.17), or to include the full call number (for example, QA76.17 .T55). " So we could encounter values that can be easily swapped out for a URI since both Dewey and LC are available as linked data (http://dewey.info/ and http://id.loc.gov/authorities/classification.html), but I am not sure it makes sense if it includes the full call number.

infomnivore commented 10 years ago

Hmm... the full call number could be systematically truncated to get that linked data mojo happening, but I appreciate the desire to retain maximum detail. However, I still think that the single property approach makes sense: you could preserve the literal as a string value and have the URI without needing to parse them out as different properties. In fact, having different but paired properties implies an equality that wouldn't be true in this case.

"BQ5542" = http://id.loc.gov/authorities/classification/BQ5542.html and "BQ5542 .C3813 2004" = http://lccn.loc.gov/2006345773, but http://id.loc.gov/authorities/classification/BQ5542.html != "BQ5542 .C3813 2004"

melanieWacker commented 10 years ago

I see your point. I opened up a separate (but related) issue https://github.com/blunalucero/MODS-RDF/issues/15 for MODS to make it easier to manage the discussion around this specific element and to mark it resolved once we decide that it is.

melanieWacker commented 10 years ago

Recommendation form 05.20.2014 conference call: Single property. Stylesheet should check if a vocabulary is provided and if so coin a URI based on the data and it there is no vocabulary create a blank node with a label. (follow suggestion made in Jan. 28th posting)