WGBH / PBCore2.0

Public Broadcasting Metadata Dictionary Project
http://www.pbcore.org
33 stars 9 forks source link

R19: Allow support for Linked Data #22

Closed pdpinch closed 10 years ago

pdpinch commented 13 years ago

The idea is you could have a Contributor named 'Hughes, Langston', or 'Smith, John' where the common name may or may not be ambiguous or unique. But if you add an actual UID to the name, it would be unambiguous and machine-readable. The Linked Data way of doing this is to include a URI for the person. This makes it possible to traverse the universe of objects containing this URI, and it links our PBCore object to that universe. We could optionally add URIs to place names, organizations, subjects, and relationships in the PBCore record. The way to allow this would be to add optional attributes to the PBCore schema, so you could include UIDs inside elements.

kvanmalssen commented 13 years ago

This is a very important issue, particularly as PBCore looks to become more semantic-web friendly. I see that a solution to this has been approached in a number of places, such as with affiliatedStringType using linkedID and rightsLinkType using anyURI. And I can see source in sourceVersionStringType being used to express a URI (but since there is no documentation for this attribute yet, so I'm not entirely sure what it's intended use is - just guessing). But I have a few concerns:

WeAreAVP commented 13 years ago

I'm confused here. I had been interpreting the source attribute as a value that identifies the authority from which the expression in the related element derives (similar to formatIdentiferSource or genreAuthorityUsed), but I do remember Peter mentioning that the source attribute could be used for URIs. If so then I agree with Kara that the name of the linked identifier should be more consistent and clear (for instance Chris uses the 'ref' attribute here.

If the source attribute is not meant for URIs then I think there are a lot more potential areas where this is needed beyond creator|contributor|publisher. For instance a linkedID attribute in coverage could point to the GPS URL or a subject like in Chris' example, or a series title to a central page for that series, like <title ref="http://www.will.uiuc.edu/tv/programs/prairiefire/"&gt;Prairie Fire</title>

Because the source attribute is related to several open issues can we get clarify on its intent.

jackbrighton commented 13 years ago

I agree this is a very important issue to define clearly in 2.0. I'm thinking the source elements Dave refers to above (formatIdentifierSource, genreAuthorityUsed, and others) would not be URIs themselves, but human-readable language. But those elements could allow an attribute like 'ref' with a URI as a value, as in Chris Beers' example. And in the case of something like pbcoreSubject, both the subject and the subjectAuthorityUsed could have ref="URI" attributes. In Chris's example the subjectAuthorityUsed is wikipedia.org which is unambiguous enough, but what if the subjectAuthorityUsed is Jim's Discount Archives?

So if the suggestion is to allow for a 'ref' attribute to many things, I totally agree. And for me, 'ref' works as well as anything.

pdpinch commented 13 years ago

Can the group comment on Jack's response: is it useful to have a human-readable @source or xxAuthorityUsed value as well as a URI?

I've been using that distinction to explain the apparentl conflict between @source and the xxAuthorityUsed elements (see issue 34)

WeAreAVP commented 13 years ago

I agree that formatIdnetiiferSource, genreAuthorityUsed and similar elements are better as human readable. It would be good to get guidance on some of the most relevant public broadcasting expressions here. Since the PBCore controlled vocabularies don't appear to have official titles, we've seen genreAuthorityUsed include dozens of various ways to say "PBCore Genre List". Same goes for "NOLA Code" to a lesser extent.

For URIs (URLs, URNs, etc) I think calling it 'source' is confusing (since we have formatIDsource, identifierSource, etc). Using the Beer-style 'ref' may make more sense as it doesn't compete with other meanings. Also the linkedID vs. source (if we use source as Peter writes) should be normalized to the same name.

Also note that this still leaves us stuck without relationIdentifierSource. I saw a record recently that said something like: relationType="Is Part Of" relationIdentifier="TL"

With this element we never know where to resolve the relationIdentifier. Even in the pbcore sample records the relationIdentifier is an alphanumeric string and not in context.

If the source attribute has a different semantic meaning than, say, identifierSource than I think we still need relationIdentifierSource (just like the three other identifers have).

Also I think the 'ref' or source attribute should describe the key value (such as identifier or subject) and not only the classifer (such as identifierSource or subjectAuthorityUsed) since the XML semantics become unclear when you're described the source or the source.

MarcosSueiro commented 13 years ago

Ironically I could not open Chris Beer's link, but I like optionally suggesting a ref= sort of link. But I am not sure why we need to specify whether this kind of "source" fields will either be machine- or human-readable. All links after all hopefully lead to a human reading the information. I know LC authorities uses both: sometimes "article in Cosmopolitan", sometimes a link to a web site. Does this create problems? Most machine readers are smart enough to know when an http link is used. Although it would probably be important internally to be consistent (either "LC" or http://authorities.loc.gov, or http://www.amazon.com/Lady-Gaga/e/B001LH2W8E/ref=ntt_mus_dp_pel), of course.

Ditto the relations field should also allow for this (including among records), although in this and indeed all cases I worry about anything that is not a so-called permalink.

jackbrighton commented 13 years ago

What I meant is I want the source element to be both human and machine readable. So you could display on a web page "article in Cosmopolitan" and it links to the article. The same concept allies to a person identified as a Creator or Contributor. You want the person's name to display, but also a URI for that person if possible so there's no ambiguity. Or am I missing your point?

MarcosSueiro commented 13 years ago

No, I think I missed yours. But I got it now! Thanks.

dmaccarn commented 13 years ago

add ref attribute to sourceVersionGroup