Closed juhahakala closed 2 months ago
Oddly enough it is quite difficult to find a usage of affiliation in bibliographic metadata. I have learned that BIBFRAME, in spite of being a transformation of MARC, which does have the x00 $u subfield for affiliation, has not defined it. The reasons are not very clear but seem to have to do with the separation between bibliographic description and authorities. However, I was advised that madsrdf has "hasAffiliation".
As with other LC-based RDF vocabularies, MADS has many hierarchical levels that end up resulting in a rather awkward cascade of bnodes. Whereas I would assume that one would like to say:
AuthorID madsrdf:hasAffiliation "X University"
in fact the domain and range are:
Domain: madsrdf:RWO Range: madsrdf:Affiliation
which appears to mean that one probably would need to assign a blank node to the range which would then link to properties that can take more concrete elements as objects. The seeming reasoning is that for MADS the affiliation is a suite of properties. An example I was given is:
:a a foaf:Agent . :a masdrdf:hasAffiliation _:affiliation
:affiliation a madsrdf:Affiliation :affiliation madsrdf:affiliationStart "2010" :affiliation madsrdf:affiliationEnd "2014" :affiliation madsrdf:natureOfAffiliation "Digital Project Coordinator" _:affiliation madsrdf:organization "LC"
Schema.org has a simpler property for affiliation. The domain is Person and the range is Organization.
MARC Bibliographic allows simple description of a person's current affiliation in X00 $u. MARC authority format has tag 373 Associated group (i.e. affiliation) which enables a more complete, MADS-like description of multiple affiliations with date ranges.
100 | 1#$aSmith, John 373 | ##$aUniversity of Leeds. Bragg Centre for Materials Research $s2000$t2005$0https://isni.org/isni/0000000475926051 373 | ##$aUniversity of Manchester$s2005 $0https://isni.org/isni/0000000121662407
It is possible to use subfield $2 to identify the source of the organization name, and $0 to specify a standard identifier.
In SRAP, it is enough to follow the MARC Bibliographic approach and just describe the affiliation by the time the scholarly resource was published or made available. We can leave it to authority data to describe the earlier or subsequent career moves of the author.
Schema.org approach covers the bare minimum, but it is important to specify also an identifier of the affiliating organization. Source of the name would be useful as well, but not essential.
Unless I'm mistaken, DC terms are all terms that conceptually refer to a single subject, referred to as the resource in the definitions. That subject is usually a document or some other information resource. Affiliation is an attribute of the dct:creator
, not of the resource itself. You can define affiliation with a domain of dct:Agent
but I don't know how you can create a dct:affiliation
such that it has dct:creator
as its RDF subject in the flat model of DC terms.
The 2005 DC document on using DC for bibliographic citations suggests using dc:contributor
for the author's affiliation, but what that says, in essence, is that the institution contributed to the creation of the resource in some way. This does not look very precise to me, and begs the question of resources with multiple creators and multiple affiliations.
We should probably look at early discussions of the Agents Working Group in 2002, where this was said:
Rough consensus emerged that there was a need in some cases to be able to describe agents (eg, affiliation, email address) completely external to resource descriptions, and in other cases to include some agent details in resource descriptions.
That group was looking to create an "agent core" to be able to include attributes of agents, such as affiliation, dates, identifiers, etc.
Obviously you can make the connection between an agent as a resource and other information in metadata using DC terms. My concern is with the model itself - how to define affiliation
such that the subject is not the same as the subject of the other terms that are defined in relation to "the resource." So far it looks to me like the DC Agents Working Group did not come to a conclusion. I'll keep reading.
I had agreed to include how we address 'Affiliation' in DataCite:
Summary of how we define ‘affiliation’ in DataCite
• 2014-08-20 v3.1: introduction of new child element "affiliation" to "creator" and "contributor"
We realised that we needed to enable the use of identifiers e.g. orcid; isni or ror.
• 2019-07-13 v4.3: Addition of new subproperties for Affiliation: "affiliationIdentifier", "affiliationIdentifierScheme", "schemeURI"
Since we discussed this at the SRAP meeting on 9 Nov, I've been thinking about how to model agents and their affiliations. I can see at least three ways:
The SRAP draft document includes this diagram (emphasis mine) showing how the "affiliation" property is applied on the Agent (e.g. creator/author), not as a property of the Scholarly Resource:
This would correspond with an RDF model something like this example. I've used the 2002 paper Dublin Core: Process and Principles as an example case (borrowing a little bit of FOAF here to express agent names). There are three authors (Shigeo Sugimoto, Thomas Baker and Stuart L. Weibel), and all of them are affilliated with a different institution. Stuart has two affiliations.
For brevity I didn't use any rdf:type
class declarations except when I've coined new classes. All other entities are of type Document, Person or Organization and it should be obvious which is which.
@prefix ex: <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dct: <http://purl.org/dc/terms/>.
ex:dc_process_paper
dct:creator ex:shigeo, ex:tom, ex:stuart .
ex:shigeo
foaf:name "Shigeo Sugimoto" ;
dct:affiliation ex:ilis .
ex:ilis
foaf:name "Institute of Library and Information Science, University of Tsukuba, Japan" .
ex:tom
foaf:name "Thomas Baker" ;
dct:affiliation ex:isb .
ex:isb
foaf:name "Institutszentrum Schloss Birlinghoven, Fraunhofer-Gesellschaft, Germany" .
ex:stuart
foaf:name "Stuart L. Weibel" ;
dct:affiliation ex:dcmi, ex:oclc .
ex:dcmi
foaf:name "Dublin Core Metadata Initiative" .
ex:oclc
foaf:name "OCLC Office of Research" .
Here is the same information visualized using RDF Sketch:
Now this model is probably too simplistic - it does express the affiliations as they're printed on the paper itself, but it's pretty obvious that the information is no longer valid. As far as I know, none of the three authors are affiliated with those institutions any more, almost 20 years after publishing this paper. If I had used persistent identifiers for them such as ISNIs or ORCIDs, the information would be plain wrong by now. It should be qualified somehow.
This brings us into another possibility - qualifying the affiliations by start and end date. This is similar to how Wikidata and ORCID models affiliations of people. To do this we need to introduce a separate Affiliation class. (NB: I didn't check the actual history of the authors so I've just made up some dates for this example)
@prefix ex: <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dct: <http://purl.org/dc/terms/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
ex:dc_process_paper
dct:creator ex:shigeo, ex:tom, ex:stuart .
ex:shigeo
foaf:name "Shigeo Sugimoto" ;
dct:affiliation ex:shigeo_ilis .
ex:shigeo_ilis a dct:Affiliation ;
dct:institution ex:ilis ;
dct:endDate "2010"^^xsd:gYear .
ex:ilis
foaf:name "Institute of Library and Information Science, University of Tsukuba, Japan" .
ex:tom
foaf:name "Thomas Baker" ;
dct:affiliation ex:tom_isb .
ex:tom_isb a dct:Affiliation ;
dct:institution ex:isb ;
dct:startDate "1999"^^xsd:gYear ;
dct:endDate "2005"^^xsd:gYear .
ex:isb
foaf:name "Institutszentrum Schloss Birlinghoven, Fraunhofer-Gesellschaft, Germany" .
ex:stuart
foaf:name "Stuart L. Weibel" ;
dct:affiliation ex:stuart_dcmi, ex:stuart_oclc .
ex:stuart_dcmi a dct:Affiliation ;
dct:institution ex:dcmi ;
dct:startDate "1995"^^xsd:gYear ;
dct:endDate "2018"^^xsd:gYear .
ex:dcmi
foaf:name "Dublin Core Metadata Initiative" .
ex:stuart_oclc
dct:institution ex:oclc ;
dct:endDate "2011"^^xsd:gYear .
ex:oclc
foaf:name "OCLC Office of Research" .
And the same visualized:
The problem I see with this is that it requires knowledge that is not apparent in the publication - the affiliation history of the persons involved (which will change over time). It may be available in a database such as ISNI, ORCID or LinkedIn, but that's not always the case. It might be helpful, for example in document repositories, if the model would only include information that is immediately available.
If we only express what is said in the paper, we can use a different kind of entity which brings together the person and the institution, but only in the context of the publication. I will call it Authorship here. It is somewhat similar to the Contribution class in BIBFRAME although that class is typically used to express roles, not affiliations.
@prefix ex: <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dct: <http://purl.org/dc/terms/>.
ex:dc_process_paper
dct:creator ex:shigeo_author, ex:tom_author, ex:stuart_author .
ex:shigeo_author a dct:Authorship ;
dct:agent ex:shigeo ;
dct:affiliation ex:ilis .
ex:shigeo
foaf:name "Shigeo Sugimoto" .
ex:ilis
foaf:name "Institute of Library and Information Science, University of Tsukuba, Japan" .
ex:tom_author a dct:Authorship ;
dct:agent ex:tom ;
dct:institution ex:isb .
ex:tom
foaf:name "Thomas Baker" .
ex:isb
foaf:name "Institutszentrum Schloss Birlinghoven, Fraunhofer-Gesellschaft, Germany" .
ex:stuart_author a dct:Authorship ;
dct:agent ex:stuart ;
dct:affiliation ex:dcmi, ex:oclc .
ex:stuart
foaf:name "Stuart L. Weibel" .
ex:dcmi
foaf:name "Dublin Core Metadata Initiative" .
ex:oclc
foaf:name "OCLC Office of Research" .
And the same visualized:
I like this one the most, as it's complex enough to express affiliations in a way that doesn't break even when people switch between institutions.
Thanks, Osma. This does make it all clearer, and I LOVE the diagrams.
I think we have 2 questions:
For 1, as you show, all three of these options are possible as long as the object of dct:creator is a thing, not a string. I don't know what would be best for SRAP, nor if you could allow any one of these depending on the need of the cataloging agency. I see the use of dates and multiple institutions to be a step beyond the cataloging of individual articles. If you use the "description" practice from libraries, the metadata encodes what is included on the article itself, and that is for the purpose of identifying that article through its own contents. Other information, like multiple affiliations, would be about the person but not about that article. I believe that if SRAP wishes to provide information about the creator that is not on the article itself, then you have moved into the creation of a name authority file. (And you are moving into the territory that ORCID occupies.) But a name authority file does not tell you which affiliation is recorded on the article being cataloged. (Nor can you divine this from the dates because an author can have moved to another institution by the time the article is published.)
For DCMI (2), the affiliation property would need to be in the domain (includes?) of dct:Agent, and with a range (includes) of foaf:Organization. Something like:
Term Name: affiliation | ||
---|---|---|
URI | http://purl.org/dc/terms/affiliation | |
Label | Affiliation | |
Definition | The institution or organization to which an Agent is a member or employee. | |
Comment | Recommended practice is ... | |
Type of Term | Property | |
Range Includes | http://xmlns.com/foaf/spec/#term_Organization | |
Has Domain | http://purl.org/dc/terms/Agent |
What I do note about the DCMI terms and classes is the lack of a term for "name", and no class for "organizations". I have seen dct:title used where your examples use "name" because the definition of dct:title is "A name given to the resource." In our case here, the node following the dct:creator property could be seen as a resource. However, I find that not intuitive. It seems that our choices are to use dct:title, use foaf:name (or some other name property, like schema.org) or add "name" to dc terms with a range of dct:Agent. The same is true for "organization". Is this something that DC terms needs? Or is it sufficient to direct people to other vocabularies?
I'm copying the March 7 message from @juhahakala that was posted to the DC-SRAP list but not here (so that I can reply below)
Hello,
based on the discussion on the list the main – or perhaps even the sole – topic of our next meeting will be Affiliation.
Regarding the options A-C below, we have to decide which one is the best. In my opinion, that is option A, documenting authors’ affiliations at the time when the described resource is published. IMO it should be possible to describe organizations either by name only (as written in the publication or known by the metadata provider), or by both the name and the identifier, eg. ISNI, ROR or RAiD.
It is far too laborious to maintain affiliations in bibliographic data. Not only do people change jobs; organizations change names, merge and disappear. As Karen noted, the best place for authors’ affiliation information is the authority record. Then, if a SRAP record contains authors’ ORCIDs and/or ISNIs as URIs, users can check an author’s affiliations from ORCID/ISNI databases where this information and other metadata about the author is centrally maintained.
The issue of retrospective updates to bibliographic data is not new. Some publishers are opposed to the semantics embedded in ISBNs. They argue that if the publisher of the book changes, the original ISBN provides erroneous information. From library point of view this is not the case: the original publisher will never change, so the information embedded in ISBN is still correct. My view on affiliation information is similar; it is OK (and important) to capture authors’ affiliations by the time of publication, but we should not go any further. As an author, I do not want the information science textbook I wrote 35 years ago while still a student in Tampere University to be in any way linked in bibliographic data to The National Library of Finland via my current affiliation to the latter institution.
As regards Osma’s example, both Stu and Shigeo have retired. IMO this is not relevant for the present day readers; what they do need to know is the affiliations they had when Dublin Core: Process and Principles was published about 20 years ago.
Osma, I do not quite understand the problem you have with the use of ORCID and ISNI. Stu does not have an ISNI record, but Shigeo does. However, his record (https://isni.org/isni/0000000373878458) does not have affiliation information. But it would be a lot easier to add such information to Shigeo’s ISNI database record than to every bibliographic record describing his publications.
Regarding the options A-C below, we have to decide which one is the best. In my opinion, that is option A, documenting authors’ affiliations at the time when the described resource is published. IMO it should be possible to describe organizations either by name only (as written in the publication or known by the metadata provider), or by both the name and the identifier, eg. ISNI, ROR or RAiD.
The problem with A is that it doesn't play well with persistent identifiers. In the case of the example paper, it states affiliations for Tom, Stuart and Shigeo as if they were universally true, even though they were only true at or around the time the paper was published. This works OK-ish as long as these resources are identified using ephemeral identifiers that are never reused elsewhere (I used e.g. http://example.org/tom
above) but if instead we would use an ISNI or ORCID to identify the author, the information becomes invalid. Option C corrects this problem, by adding a level of indirection in the form of the Authorship class; the Authorship resource is specific to the paper.
In MARC formats, the problem you describe is avoided because authors' affiliation information is provided in MARC authority records (tag 373 Associated group, https://www.loc.gov/marc/authority/ad373.html) instead of bibliographic records. MARC bibliographic format does not have this tag. Since Dublin Core does not have authority format, we can either ignore the problem created by the ephemeral nature of affiliation information, or try to solve it one way or another. As far as I can tell, your solution works technically, and if it does make cataloguing and DC metadata utilization more difficult, then only marginally.
Here's a new proposal for today's meeting. This is based on Option C, but with some adjustments in terminology.
AffiliatedAgent
AffiliatedAgent
, which represents an agent (typically a person, but could also be e.g. a working group) affiliated to an institution in the context of the described publicationagent
agent
, which connects an AffiliatedAgent to the agent (typically a person)institution
agent
, which connects an AffiliatedAgent to the agent (typically a person)Example usage (same as Option C above, just replaced Authorship with AffiliatedAgent):
@prefix ex: <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dct: <http://purl.org/dc/terms/>.
ex:dc_process_paper
dct:creator ex:shigeo_author, ex:tom_author, ex:stuart_author .
ex:shigeo_author a dct:AffiliatedAgent ;
dct:agent ex:shigeo ;
dct:affiliation ex:ilis .
ex:shigeo
foaf:name "Shigeo Sugimoto" .
ex:ilis
foaf:name "Institute of Library and Information Science, University of Tsukuba, Japan" .
ex:tom_author a dct:AffiliatedAgent ;
dct:agent ex:tom ;
dct:institution ex:isb .
ex:tom
foaf:name "Thomas Baker" .
ex:isb
foaf:name "Institutszentrum Schloss Birlinghoven, Fraunhofer-Gesellschaft, Germany" .
ex:stuart_author a dct:AffiliatedAgent ;
dct:agent ex:stuart ;
dct:affiliation ex:dcmi, ex:oclc .
ex:stuart
foaf:name "Stuart L. Weibel" .
ex:dcmi
foaf:name "Dublin Core Metadata Initiative" .
ex:oclc
foaf:name "OCLC Office of Research" .
I've created an example of affiliation information in XML, based on the same 2002 paper example. For discussion in the next SRAP meeting.
The example relies in part on the idea of using XML attributes to express PIDS in DC, as discussed in the PIDS in DC session in Porto 2018. However, since the element name id is problematic, I've used pid
instead.
Hi Osma, Thank you for the XML example which seems to work well.
I’m afraid I am on leave for the meeting so apologies,
Jan
Sent from my iPhone
On 14 Apr 2022, at 18:55, Osma Suominen @.***> wrote:
I've created an example of affiliation information in XML, based on the same 2002 paper example. For discussion in the next SRAP meeting.
The example relies in part on the idea of using XML attributes to express PIDS in DC, as discussed in the PIDS in DC session in Porto 2018. However, since the element name id is problematic, I've used pid instead.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.
Next step is to document the new class and properties in a Markdown snippet similar to date_subproperties.md. I can do that.
Alasdair will call a meeting on the PIDS in DC group, with the goal of finishing and publishing a specification how to express PIDs.
Still no Markdown snippet, but I'm working on it...meanwhile, here's an updated SRAP domain model diagram with the new AffiliatedAgent class:
We now have a means to provide affiliations in the Person shape, so closing this issue.
Proposed DCMI Metadata Terms: http://purl.org/dc/terms/affiliation
Label: Affiliation
An organization to which an agent is or was affiliated.
Domain includes: dcterms:Agent
Recommended practice is to identify the affiliation with a URI. If this is not possible or feasible, a literal value that identifies the affiliated organization may be provided. It is also possible to give both the name and the URI.
If a name is given, it should be provided in full and in hierarchical order, starting from the largest organizational unit.
-- Discussion and comments --
If a name is not unique, it is essential to provide both the name and an identifier of the affiliation. Identifier should be presented as URI if possible.
Choosing the correct name is a non-trivial problem (Helsinki university or Helsingin yliopisto, Trinity college Dublin or Dublin University. Trinity college) but it is outside the SRAP scope, apart from a generic guidance provided above.
Providing both the name and the identifier of an organization (or person) in a single XML string is a separate issue that is already under discussion in DCMI. Current SRAP draft provides a possible solution, but the syntax may change pending the forthcoming DCMI UB decision.