Closed melanieWacker closed 9 years ago
I think this is a great idea. I did something very similar when I was converting VRA/XML into RDF. Even though, in a perfect world, all of the languages (for example) would be coded using controlled vocabulary IDs, that is not always the case. We certainly do not want to throw out data. There is always the opportunity to run a mapReduce job after the fact and reconcile all of the languages that are coded using strings.
The issue here is should the same property (name) be used when in one case the object is an rdf resource, and in another case it is a literal:
(1) ModsResource12356 language http://id.loc.gov/vocabulary/iso639-2/fra/
(2) ModsResource12356 language “french”
or should distinct properties be defined: (1) ModsResource12356 language http://id.loc.gov/vocabulary/iso639-2/fra/
(2) ModsResource12356 languageLiteral “french”
I would argue that you should only have one property (preferably an object property). Other than the existence of poor legacy data, there is no reason why one would want to have both languageOfResource and languageLiteral properties. They mean the exact same thing with regards to how they are interpreted in relation to the item being described. This would be similar to having both a authorOfResource object property as well as a autherLiteral data property.
In order to accommodate for legacy data we could explore what Schema.org did and not set a domain and range on the property (and also not claim that it is a owl:ObjectProperty or owl:DataProperty). Or we could adjust for the variance in bib data by allowing the XSL stylesheet to check if a vocabulary is provided and if so coin a URI based on the data and it there is no vocabulary create a blank node. This is what I did in my VRA stylesheet.
I strongly dislike the dual property approach too, but it is the approach that BIBFRAME is taking (at least, for now). The argument is: on one hand there is overhead and complexity for dual properties, but on the other hand there is the processing cost of not knowing what the object of a given property is going to be. And that the processing cost of the latter outweighs the complexity of the first. That's the argument anyway, I can't say that I support that argument.
From: Jeff Mixter [mailto:notifications@github.com] Sent: Tuesday, January 28, 2014 5:25 PM To: blunalucero/MODS-RDF Cc: Denenberg, Ray Subject: Re: [MODS-RDF] Vocabulary term as object of a triple: Should MODS RDF use the same property for expressing a vocabulary term as literal or URI? (#6)
I would argue that you should only have one property (preferably an object property). Other than the existence of poor legacy data, there is no reason why one would want to have both languageOfResource and languageLiteral properties. They mean the exact same thing with regards to how they are interpreted in relation to the item being described. This would be similar to having both a authorOfResource object property as well as a autherLiteral data property.
In order to accommodate for legacy data we could explore what Schema.org did and not set a domain and range on the property (and also not claim that it is a owl:ObjectProperty or owl:DataProperty). Or we could adjust for the variance in bib data by allowing the XSL stylesheet to check if a vocabulary is provided and if so coin a URI based on the data and it there is no vocabulary create a blank node. This is what I did in my VRA stylesheet.
— Reply to this email directly or view it on GitHub https://github.com/blunalucero/MODS-RDF/issues/6#issuecomment-33534200 . https://github.com/notifications/beacon/4854536__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcwNjQ4MDcyMywiZGF0YSI6eyJpZCI6MjMxNTc2MDB9fQ==--5278400e571bf420137bbb748aab8256de8a0910.gif
I like the way Jeff did for the VRA stylesheet - allowing the XSL stylesheet to check if a vocabulary is provided - and if so, coin a URI based on the data, and if there is no vocabulary, create a blank node. However, I think we have to decide whether we will have one approach across all vocabulary terms, or will use different methods depends on the terms, for example geographic terms vs. language. I think there would be literal values from some legacy data that we want to keep as is in geographic terms, but not much so in language.
Since this issue is still open, I will add my voice to the chorus in favor of the single property approach. As to the issue MJ raises, I'm not sure I see why/how geographic terms would substantively differ from language terms: are you saying there are cases where you would want the literal preserved even if a vocabulary was given such that it could be swapped out for a URI? If so I think I need an example to wrap my head around...
I am wondering if classification may be an example for such as case? The current MODS XML guidelines state "The DLF/Aquifer guidelines recommend that
Hmm... the full call number could be systematically truncated to get that linked data mojo happening, but I appreciate the desire to retain maximum detail. However, I still think that the single property approach makes sense: you could preserve the literal as a string value and have the URI without needing to parse them out as different properties. In fact, having different but paired properties implies an equality that wouldn't be true in this case.
"BQ5542" = http://id.loc.gov/authorities/classification/BQ5542.html and "BQ5542 .C3813 2004" = http://lccn.loc.gov/2006345773, but http://id.loc.gov/authorities/classification/BQ5542.html != "BQ5542 .C3813 2004"
I see your point. I opened up a separate (but related) issue
https://github.com/blunalucero/MODS-RDF/issues/15
for MODS
Recommendation form 05.20.2014 conference call: Single property. Stylesheet should check if a vocabulary is provided and if so coin a URI based on the data and it there is no vocabulary create a blank node with a label. (follow suggestion made in Jan. 28th posting)
For background see: Development of a MODS RDF Ontology: Discussion Points Ray Denenberg, Library of Congress November 4, 2013 discussion point 2.1