blunalucero / MODS-RDF

MODS RDF is an RDF ontology for MODS. As MODS is an XML schema for a bibliographic element set, MODS RDF is an expression of that element set in RDF.
7 stars 4 forks source link

Should MODS RDF parse out subTitle, partNumber, partName, and nonSort, or is a single string combining them sufficient? #1

Closed melanieWacker closed 9 years ago

melanieWacker commented 10 years ago

Current MODS RDF Ontology represents the title via MADS RDF

melanieWacker commented 10 years ago

My feeling is that a single string is sufficient. I was trying to come up with use cases for parsing out this information, but could not think of any. Other RDF examples that I have been looking at handle it as a single string as well. (I had sent this comment previously to the lsitserv)

melanieWacker commented 10 years ago

Conference call 1/29/14: Needs to be considered in relation to topics 13 (MADS) and 5 (Authorities)

kefo commented 10 years ago

I'd also say "no," but I've been outvoted before because I'd have done the same in BF.

What's the use case for individually identifying these data points?

For example, I've yet to see a case where I'd actually want the subtitle separated from the title (for giggles one should create a 'browse display' of just the main titles and you will see exactly how necessary the subtitle is for context). Ditto for partNumber or partName - I seem to need them as part of a long string and I've never needed them individually, but YMMV. So, when would these be used such that individually identifying them makes sense and is totally worth the overhead?

Now, nonSort is tricky. That's actually knowledge we frequently need and put to use. That said, there are ways to handle this in RDF - duplicate properties so that one is for "sorting" and the other is more or less faithful to the original or use a nonSort unicode character in the string itself or a specially crafted language tag on the literal that either encodes the information or tells you of the value represents a "sortable" literal.

kefo commented 10 years ago

I was closing some open tabs in the browser and ran across this. I did think of a Use Case for distinguishing the subTitle: for UI rendering. I know this is a step closer to the do-we-or-don't-we-encode-data-for-display-in-metadata issue, but this might be one such area to make an exception. For example, someone might want to display title info across two lines, something like

Drive

Surviving when everyone else is texting

and there could be any number of syntax choices someone used to mark the boundary between the title and subtitle (a colon, a hyphen, a slash, etc). The granularity comes in handy in this case.

Now, to me, all the other business - partNumber, partName - doesn't require special treatment and alternate methods of capturing and displaying this information should be explored.

melanieWacker commented 10 years ago

Just for the fun of if it I did a title browse in OCLC for "Drive" and ended up with 1180 tiles. Really would love to have subtitles there ... But I see your point and one can still display the subtitle or other information for context. data.bnf.fr adds dates and authors to the title browse instead of subtitles for example.

kefo commented 10 years ago

I'm mostly OK with the idea that 'data is data first; we don't do display data,' but Rebecca and I were chatting and I did think of the use case. To play the devil's advocate, the UI layer could be the layer responsible for splitting a title into a "title/subtitle" combination based on the presence of a colon, for example (or, in the absence of a colon, a slash, or hyphen, etc). Works both ways.

I'd be in favor of dispensing with subtitle in no small part because I don't think 'titles' make sense without them:

Drive: Surviving when everyone else is texting

Drive: A beginner's guide to learning to drive

Drive: Getting ahead in the world

Drive: Moving the football

versus

Drive

This is slightly tangential, but .... like the BNF does with a browse list, I have found it necessary to add a lot of 'context' to lists of BF Instances - creators, titles, publication dates, publishers, place of publication. Without all of that information, how else do you know which Instance/Manifestation of Huckleberry Finn to attach the HeldItem/Item?

mixterj commented 10 years ago

I agree that Title and subTitle should be included in the same string but it does not make sense to me to have partNumber included in the same string. Those values are properties of the thing being described not part of the Title. For example the Title of this journal (http://www.tandfonline.com/toc/wjlm20/current#.U2lIEvldV8E) is not "Journal of Library Metadata, Volume 14, Issue 1, 2014" it is "Journal of Library Metadata". Properties of this entity could include mods:volume, mods:issue and mods:publicationDate but the values (which are very important in and of themselves) should not be crammed into the title field. This practice is reflective of antiquated MARC standards, which need to be ignored when designing new data models. As far as partName and nonSort, these seem inherently tied to MARC/record type things, which again I do not think we should use in the ontology. nonSort in particular is odd in that it seems like this type of information should be included in the services that process the data not the data itself. In particular this should not be included in an RDF description of the entity (what dose it have to do with the entity being described?). If it is important, then it could be included as a property in the void:Dataset description that links to descriptions of all of the entities in a library collection, catalog etc. The partName seems to make sense as a property but the example provided on the LOC website (http://www.loc.gov/standards/mods/v3/mods-userguide-elements.html) do not provide many useful examples of how it is used. It seems like this should indicate another thing that is being described, which would have its own title and be connected to the original entity via some object property. Again, this last statement is made based on my very basic understanding of how partName is used.

kefo commented 10 years ago

This practice is reflective of antiquated MARC standards, which need to be ignored when designing new data models.

Careful: There are indeed design patterns one does not want to perpetuate, but this sentiment smacks of throwing the baby out with the bath water. Surely there is room for a more nuanced approach than this.

nonSort in particular is odd in that it seems like this type of information should be included in the services that process the data not the data itself. In particular this should not be included in an RDF description of the entity (what dose it have to do with the entity being described?). If it is important, then it could be included as a property in the void:Dataset description that links to descriptions of all of the entities in a library collection, catalog etc.

Three thoughts here:

1) Generally, could you please clarify some of the above. I do not understand the last sentence of the quote. What exactly could be included in the suddenly mandatory void:Dataset?

2) I feel a close look at a lot of instance data is needed before dispensing with the concept behind nonSort.

And that's just the tip of the iceberg. It very, very quickly becomes difficult to deal with this variety. One's inclination is to say "then don't strip leading words from titles; just use the 'The'." This is fine, of course, but you end up with pages and pages and pages and pages of titles that begin with "The" and "A" and "An." That solution will survive up until the time reference librarians or collection administrators work with the system and encounter this, at which point they will come back and ask, "why?" And, because of cataloging practice, there will still be an entry half the time without the "The" because that is how Uniform Titles were/are created and MODS cataloging is also participant in that tradition. Indeed, skilled users often know to look for "Die Blechtrommel" under "B" and "Les Misérables" under "M."

3) The third thought has to do with the part after the first sentence. Is that level of complexity necessary to capture nonSort info? Maybe I'm misunderstanding but it seems like you are suggesting we shove that type of information "over there" because it's not sufficiently about the "thing." (A distinction, might I add, that seems like a pretty fine line.) If I understand correctly, then I feel the solution preferences a desire for a pristine and pre-eminent data model at the expense of usability.

mods:volume, mods:issue and mods:publicationDate

Are all better semantics for that information.

mixterj commented 10 years ago

I fully admit that I was a bit pointed by that first statement. Being a modeler first, my opinion is generally "burn it down and start again", but obviously that is not always realistic. My comment about void:Dataset has to do with separating the description of things from the description of datasets as a whole. Datasets carry with them provenance and authority (based on how people view them), data does not! I can say anything is "authoritative" or an "Authority" but who cares? This type of information should reside in the description of the dataset. Regardless, it does not seem very pertinent to this conversation.

As I mentioned about the nonSort, my understanding of it is not complete (especially since the LOC documentation is so poor). Therefore, I was only commenting based on my extremely high level understanding of the property.

Thanks,

Jeff Mixter


From: Kevin Ford notifications@github.com Sent: Tuesday, May 06, 2014 5:58 PM To: blunalucero/MODS-RDF Cc: Mixter,Jeff Subject: Re: [MODS-RDF] Should MODS RDF parse out subTitle, partNumber, partName, and nonSort, or is a single string combining them sufficient? (#1)

This practice is reflective of antiquated MARC standards, which need to be ignored when designing new data models.

Careful: There are indeed design patterns one does not want to perpetuate, but this sentiment smacks of throwing the baby out with the bath water. Surely there is room for a more nuanced approach than this.

nonSort in particular is odd in that it seems like this type of information should be included in the services that process the data not the data itself. In particular this should not be included in an RDF description of the entity (what dose it have to do with the entity being described?). If it is important, then it could be included as a property in the void:Dataset description that links to descriptions of all of the entities in a library collection, catalog etc.

Three thoughts here:

1) Generally, could you please clarify some of the above. I do not understand the last sentence of the quote. What exactly could be included in the suddenly mandatory void:Dataset?

2) I feel a close look at a lot of instance data is needed before dispensing with the concept behind nonSort.

And that's just the tip of the iceberg. It very, very quickly becomes difficult to deal with this variety. One's inclination is to say "then don't strip leading words from titles; just use the 'The'." This is fine, of course, but you end up with pages and pages and pages and pages of titles that begin with "The" and "A" and "An." That solution will survive up until the time reference librarians or collection administrators work with the system and encounter this, at which point they will come back and ask, "why?" And, because of cataloging practice, there will still be an entry half the time without the "The" because that is how Uniform Titles were/are created and MODS cataloging is also participant in that tradition. Indeed, skilled users often know to look for "Die Blechtrommel" under "B" and "Les Mis?rables" under "M."

3) The third thought has to do with the part after the first sentence. Is that level of complexity necessary to capture nonSort info? Maybe I'm misunderstanding but it seems like you are suggesting we shove that type of information "over there" because it's not sufficiently about the "thing." (A distinction, might I add, that seems like a pretty fine line.) If I understand correctly, then I feel the solution preferences a desire for a pristine and pre-eminent data model at the expense of usability.

mods:volume, mods:issue and mods:publicationDate

Are all better semantics for that information.

Reply to this email directly or view it on GitHubhttps://github.com/blunalucero/MODS-RDF/issues/1#issuecomment-42365268.

melanieWacker commented 10 years ago

I’ve been thinking about partName and partNumber. First I want to shed some light on their usage, or at least how we have used it. Following library content standards, we’ve been using partName for titles that are not meaningful enough to stand by themselves and therefore need to be preceded by the full title to have any context. E.g. “Title of book. Supplement”. For our digital objects there are cases where only part of a work was digitized. So to clarify things we used “Title of work. Introduction” (or whatever that part was) We did not want anybody to think they’d get the entire book when a few pages is all there is. Happens with patron orders for example. We have not taken advantage of the granularity offered by MODS XML, though, and it all went into one title element. So generally I am fine with not identifying them individually. Just to play devil’s advocate: For display issues the granularity could have been helpful at times and the separators aren’t always consistent. Probably not enough of a use case to justify the overhead, though.

raydAtLC commented 9 years ago

Let’s take the example: "Journal of Library Metadata, Volume 14, Issue 1, 2014"

And for the sake of making this even MORE interesting, pretend there is a “the” at the beginning so the title is really:

"The Journal of Library Metadata, Volume 14, Issue 1, 2014"

I agree with Jeff that there are parts of this that are important to preserve but should not be expressed as part of the title; they describe the resource. Specifically: volume, issue, and dateOfPublication.

Here is my radical suggestion (an idea I have pushed before, though never adopted).

http://www.example.org/xyz/modsResource/123 title “The Journal of Library Metadata” volume "14" issue "1" dateOfPublication "2014" titleForSort "Journal of Library Metadata, Volume 14, Issue 1, 2014"

So, the actual title contains the “The” but the “titleForSort” doesn’t. On the other hand, the titleForSort includes the non-title information (because it may be useful for sorting) but the actual title does not.

melanieWacker commented 9 years ago

Can you explain a little bit further? Would we then define subproperties for volume and issue?

raydAtLC commented 9 years ago

"Can you explain a little bit further? Would we then define subproperties for volume and issue? "

Yes, that's what I was trying to convey in the example:

title “The Journal of Library Metadata” volume "14" issue "1" dateOfPublication "2014"

So you have the properties title, volume, issue, and dateOfPublication.

cmharlow commented 9 years ago

See minutes on title model proposal, acceptance: https://github.com/blunalucero/MODS-RDF/wiki/MODS-RDF-Working-Group-Call-11.14.14

barmintor commented 9 years ago

The timing of our development team collecting our thoughts about this could not be worse, it seems.

melanieWacker commented 9 years ago

Do you want to re-open the issue?

barmintor commented 9 years ago

Reading the minutes, a subtyped property to support sorting where it differs from the complete title seems like an ideal solution for us. Thumbs up!