dcmi / dc-srap

Scholarly Resources Application Profile working group
6 stars 3 forks source link

Affiliation #3

Closed juhahakala closed 2 months ago

juhahakala commented 3 years ago

Proposed DCMI Metadata Terms: http://purl.org/dc/terms/affiliation

Label: Affiliation

An organization to which an agent is or was affiliated.

Domain includes: dcterms:Agent

Recommended practice is to identify the affiliation with a URI. If this is not possible or feasible, a literal value that identifies the affiliated organization may be provided. It is also possible to give both the name and the URI.

If a name is given, it should be provided in full and in hierarchical order, starting from the largest organizational unit.

-- Discussion and comments --

If a name is not unique, it is essential to provide both the name and an identifier of the affiliation. Identifier should be presented as URI if possible.

Choosing the correct name is a non-trivial problem (Helsinki university or Helsingin yliopisto, Trinity college Dublin or Dublin University. Trinity college) but it is outside the SRAP scope, apart from a generic guidance provided above.

Providing both the name and the identifier of an organization (or person) in a single XML string is a separate issue that is already under discussion in DCMI. Current SRAP draft provides a possible solution, but the syntax may change pending the forthcoming DCMI UB decision.

University of Helsinki. Faculty of Theology ISNI and ORCID shall always be provided as URIs, not as literals, since the literal form (e.g. 0000 0004 0624 6810) does not indicate whether the string is an ISNI or ORCID or even some other identifier that just happens to have the same syntax.
kcoyle commented 3 years ago

Oddly enough it is quite difficult to find a usage of affiliation in bibliographic metadata. I have learned that BIBFRAME, in spite of being a transformation of MARC, which does have the x00 $u subfield for affiliation, has not defined it. The reasons are not very clear but seem to have to do with the separation between bibliographic description and authorities. However, I was advised that madsrdf has "hasAffiliation".

As with other LC-based RDF vocabularies, MADS has many hierarchical levels that end up resulting in a rather awkward cascade of bnodes. Whereas I would assume that one would like to say:

AuthorID madsrdf:hasAffiliation "X University"

in fact the domain and range are:

Domain: madsrdf:RWO Range: madsrdf:Affiliation

which appears to mean that one probably would need to assign a blank node to the range which would then link to properties that can take more concrete elements as objects. The seeming reasoning is that for MADS the affiliation is a suite of properties. An example I was given is:

:a a foaf:Agent . :a masdrdf:hasAffiliation _:affiliation

:affiliation a madsrdf:Affiliation :affiliation madsrdf:affiliationStart "2010" :affiliation madsrdf:affiliationEnd "2014" :affiliation madsrdf:natureOfAffiliation "Digital Project Coordinator" _:affiliation madsrdf:organization "LC"

Schema.org has a simpler property for affiliation. The domain is Person and the range is Organization.

juhahakala commented 3 years ago

MARC Bibliographic allows simple description of a person's current affiliation in X00 $u. MARC authority format has tag 373 Associated group (i.e. affiliation) which enables a more complete, MADS-like description of multiple affiliations with date ranges.

100 | 1#$aSmith, John 373 | ##$aUniversity of Leeds. Bragg Centre for Materials Research $s2000$t2005$0https://isni.org/isni/0000000475926051 373 | ##$aUniversity of Manchester$s2005 $0https://isni.org/isni/0000000121662407

It is possible to use subfield $2 to identify the source of the organization name, and $0 to specify a standard identifier.

In SRAP, it is enough to follow the MARC Bibliographic approach and just describe the affiliation by the time the scholarly resource was published or made available. We can leave it to authority data to describe the earlier or subsequent career moves of the author.

Schema.org approach covers the bare minimum, but it is important to specify also an identifier of the affiliating organization. Source of the name would be useful as well, but not essential.

kcoyle commented 3 years ago

Unless I'm mistaken, DC terms are all terms that conceptually refer to a single subject, referred to as the resource in the definitions. That subject is usually a document or some other information resource. Affiliation is an attribute of the dct:creator, not of the resource itself. You can define affiliation with a domain of dct:Agent but I don't know how you can create a dct:affiliation such that it has dct:creator as its RDF subject in the flat model of DC terms.

The 2005 DC document on using DC for bibliographic citations suggests using dc:contributor for the author's affiliation, but what that says, in essence, is that the institution contributed to the creation of the resource in some way. This does not look very precise to me, and begs the question of resources with multiple creators and multiple affiliations.

We should probably look at early discussions of the Agents Working Group in 2002, where this was said:

Rough consensus emerged that there was a need in some cases to be able to describe agents (eg, affiliation, email address) completely external to resource descriptions, and in other cases to include some agent details in resource descriptions.

That group was looking to create an "agent core" to be able to include attributes of agents, such as affiliation, dates, identifiers, etc.

Obviously you can make the connection between an agent as a resource and other information in metadata using DC terms. My concern is with the model itself - how to define affiliation such that the subject is not the same as the subject of the other terms that are defined in relation to "the resource." So far it looks to me like the DC Agents Working Group did not come to a conclusion. I'll keep reading.

Jashton123 commented 3 years ago

I had agreed to include how we address 'Affiliation' in DataCite:

Summary of how we define ‘affiliation’ in DataCite

• 2014-08-20 v3.1: introduction of new child element "affiliation" to "creator" and "contributor"

We realised that we needed to enable the use of identifiers e.g. orcid; isni or ror.

• 2019-07-13 v4.3: Addition of new subproperties for Affiliation: "affiliationIdentifier", "affiliationIdentifierScheme", "schemeURI"

Uniquely identifies an affiliation, according to various identifier schemes. Example of how we map affiliation on contributor to dcterms: 2.5 Affiliation dcterms:contributor 2.5.a affiliationIdentifier dcterms:identifier 2.5.b affiliationIdentifierScheme Not present in Dublin Core 2.5.c SchemeURI Not present in Dublin Core Where we have the problem of multiple contributors with affiliations not being nested. Note: We are currently proposing adding identifiers on the Publisher element to make it more efficient for DataCite users to record a Publisher as an affiliation. This may be a special case for DataCite because we allow the Publisher element to be used for the hosting institution and there may be the use case that the researcher is affiliated to it. It is likely that we will approve adding the identifier elements to Publisher. However, there will be more discussion and recommendations within the DataCite Metadata Working Group on applying affiliations to Publishers.
osma commented 2 years ago

Since we discussed this at the SRAP meeting on 9 Nov, I've been thinking about how to model agents and their affiliations. I can see at least three ways:

Option A

The SRAP draft document includes this diagram (emphasis mine) showing how the "affiliation" property is applied on the Agent (e.g. creator/author), not as a property of the Scholarly Resource:

kuva

This would correspond with an RDF model something like this example. I've used the 2002 paper Dublin Core: Process and Principles as an example case (borrowing a little bit of FOAF here to express agent names). There are three authors (Shigeo Sugimoto, Thomas Baker and Stuart L. Weibel), and all of them are affilliated with a different institution. Stuart has two affiliations.

For brevity I didn't use any rdf:type class declarations except when I've coined new classes. All other entities are of type Document, Person or Organization and it should be obvious which is which.

@prefix ex:   <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dct:  <http://purl.org/dc/terms/>.

ex:dc_process_paper
  dct:creator ex:shigeo, ex:tom, ex:stuart .

ex:shigeo
  foaf:name "Shigeo Sugimoto" ;
  dct:affiliation ex:ilis .

ex:ilis
  foaf:name "Institute of Library and Information Science, University of Tsukuba, Japan" .

ex:tom
  foaf:name "Thomas Baker" ;
  dct:affiliation ex:isb .

ex:isb
  foaf:name "Institutszentrum Schloss Birlinghoven, Fraunhofer-Gesellschaft, Germany" .

ex:stuart
  foaf:name "Stuart L. Weibel" ;
  dct:affiliation ex:dcmi, ex:oclc .

ex:dcmi
  foaf:name "Dublin Core Metadata Initiative" .

ex:oclc
  foaf:name "OCLC Office of Research" .

Here is the same information visualized using RDF Sketch:

kuva

Now this model is probably too simplistic - it does express the affiliations as they're printed on the paper itself, but it's pretty obvious that the information is no longer valid. As far as I know, none of the three authors are affiliated with those institutions any more, almost 20 years after publishing this paper. If I had used persistent identifiers for them such as ISNIs or ORCIDs, the information would be plain wrong by now. It should be qualified somehow.

Option B

This brings us into another possibility - qualifying the affiliations by start and end date. This is similar to how Wikidata and ORCID models affiliations of people. To do this we need to introduce a separate Affiliation class. (NB: I didn't check the actual history of the authors so I've just made up some dates for this example)

@prefix ex:   <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dct:  <http://purl.org/dc/terms/>.
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#>.

ex:dc_process_paper
  dct:creator ex:shigeo, ex:tom, ex:stuart .

ex:shigeo
  foaf:name "Shigeo Sugimoto" ;
  dct:affiliation ex:shigeo_ilis .

ex:shigeo_ilis a dct:Affiliation ;
  dct:institution ex:ilis ;
  dct:endDate "2010"^^xsd:gYear .

ex:ilis
  foaf:name "Institute of Library and Information Science, University of Tsukuba, Japan" .

ex:tom
  foaf:name "Thomas Baker" ;
  dct:affiliation ex:tom_isb .

ex:tom_isb a dct:Affiliation ;
  dct:institution ex:isb ;
  dct:startDate "1999"^^xsd:gYear ;
  dct:endDate "2005"^^xsd:gYear .

ex:isb
  foaf:name "Institutszentrum Schloss Birlinghoven, Fraunhofer-Gesellschaft, Germany" .

ex:stuart
  foaf:name "Stuart L. Weibel" ;
  dct:affiliation ex:stuart_dcmi, ex:stuart_oclc .

ex:stuart_dcmi a dct:Affiliation ;
  dct:institution ex:dcmi ;
  dct:startDate "1995"^^xsd:gYear ;
  dct:endDate "2018"^^xsd:gYear .

ex:dcmi
  foaf:name "Dublin Core Metadata Initiative" .

ex:stuart_oclc
  dct:institution ex:oclc ;
  dct:endDate "2011"^^xsd:gYear .

ex:oclc
  foaf:name "OCLC Office of Research" .

And the same visualized:

kuva

The problem I see with this is that it requires knowledge that is not apparent in the publication - the affiliation history of the persons involved (which will change over time). It may be available in a database such as ISNI, ORCID or LinkedIn, but that's not always the case. It might be helpful, for example in document repositories, if the model would only include information that is immediately available.

Option C

If we only express what is said in the paper, we can use a different kind of entity which brings together the person and the institution, but only in the context of the publication. I will call it Authorship here. It is somewhat similar to the Contribution class in BIBFRAME although that class is typically used to express roles, not affiliations.

@prefix ex:   <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dct:  <http://purl.org/dc/terms/>.

ex:dc_process_paper
  dct:creator ex:shigeo_author, ex:tom_author, ex:stuart_author .

ex:shigeo_author a dct:Authorship ;
  dct:agent ex:shigeo ;
  dct:affiliation ex:ilis .

ex:shigeo
  foaf:name "Shigeo Sugimoto" .

ex:ilis
  foaf:name "Institute of Library and Information Science, University of Tsukuba, Japan" .

ex:tom_author a dct:Authorship ;
  dct:agent ex:tom ;
  dct:institution ex:isb .

ex:tom
  foaf:name "Thomas Baker" .

ex:isb
  foaf:name "Institutszentrum Schloss Birlinghoven, Fraunhofer-Gesellschaft, Germany" .

ex:stuart_author a dct:Authorship ;
  dct:agent ex:stuart ;
  dct:affiliation ex:dcmi, ex:oclc .

ex:stuart
  foaf:name "Stuart L. Weibel" .

ex:dcmi
  foaf:name "Dublin Core Metadata Initiative" .

ex:oclc
  foaf:name "OCLC Office of Research" .

And the same visualized:

kuva

I like this one the most, as it's complex enough to express affiliations in a way that doesn't break even when people switch between institutions.

kcoyle commented 2 years ago

Thanks, Osma. This does make it all clearer, and I LOVE the diagrams.

I think we have 2 questions:

  1. What does SRAP want to say about authors and affilliations?
  2. How could DCMI define an affiliation property?

For 1, as you show, all three of these options are possible as long as the object of dct:creator is a thing, not a string. I don't know what would be best for SRAP, nor if you could allow any one of these depending on the need of the cataloging agency. I see the use of dates and multiple institutions to be a step beyond the cataloging of individual articles. If you use the "description" practice from libraries, the metadata encodes what is included on the article itself, and that is for the purpose of identifying that article through its own contents. Other information, like multiple affiliations, would be about the person but not about that article. I believe that if SRAP wishes to provide information about the creator that is not on the article itself, then you have moved into the creation of a name authority file. (And you are moving into the territory that ORCID occupies.) But a name authority file does not tell you which affiliation is recorded on the article being cataloged. (Nor can you divine this from the dates because an author can have moved to another institution by the time the article is published.)

For DCMI (2), the affiliation property would need to be in the domain (includes?) of dct:Agent, and with a range (includes) of foaf:Organization. Something like:

Term Name: affiliation
URI http://purl.org/dc/terms/affiliation
Label Affiliation
Definition The institution or organization to which an Agent is a member or employee.
Comment Recommended practice is ...
Type of Term Property
Range Includes http://xmlns.com/foaf/spec/#term_Organization
Has Domain http://purl.org/dc/terms/Agent

What I do note about the DCMI terms and classes is the lack of a term for "name", and no class for "organizations". I have seen dct:title used where your examples use "name" because the definition of dct:title is "A name given to the resource." In our case here, the node following the dct:creator property could be seen as a resource. However, I find that not intuitive. It seems that our choices are to use dct:title, use foaf:name (or some other name property, like schema.org) or add "name" to dc terms with a range of dct:Agent. The same is true for "organization". Is this something that DC terms needs? Or is it sufficient to direct people to other vocabularies?

osma commented 2 years ago

I'm copying the March 7 message from @juhahakala that was posted to the DC-SRAP list but not here (so that I can reply below)

Hello,

based on the discussion on the list the main – or perhaps even the sole – topic of our next meeting will be Affiliation.

Regarding the options A-C below, we have to decide which one is the best. In my opinion, that is option A, documenting authors’ affiliations at the time when the described resource is published. IMO it should be possible to describe organizations either by name only (as written in the publication or known by the metadata provider), or by both the name and the identifier, eg. ISNI, ROR or RAiD.

It is far too laborious to maintain affiliations in bibliographic data. Not only do people change jobs; organizations change names, merge and disappear. As Karen noted, the best place for authors’ affiliation information is the authority record. Then, if a SRAP record contains authors’ ORCIDs and/or ISNIs as URIs, users can check an author’s affiliations from ORCID/ISNI databases where this information and other metadata about the author is centrally maintained.

The issue of retrospective updates to bibliographic data is not new. Some publishers are opposed to the semantics embedded in ISBNs. They argue that if the publisher of the book changes, the original ISBN provides erroneous information. From library point of view this is not the case: the original publisher will never change, so the information embedded in ISBN is still correct. My view on affiliation information is similar; it is OK (and important) to capture authors’ affiliations by the time of publication, but we should not go any further. As an author, I do not want the information science textbook I wrote 35 years ago while still a student in Tampere University to be in any way linked in bibliographic data to The National Library of Finland via my current affiliation to the latter institution.

As regards Osma’s example, both Stu and Shigeo have retired. IMO this is not relevant for the present day readers; what they do need to know is the affiliations they had when Dublin Core: Process and Principles was published about 20 years ago.

Osma, I do not quite understand the problem you have with the use of ORCID and ISNI. Stu does not have an ISNI record, but Shigeo does. However, his record (https://isni.org/isni/0000000373878458) does not have affiliation information. But it would be a lot easier to add such information to Shigeo’s ISNI database record than to every bibliographic record describing his publications.

osma commented 2 years ago

Regarding the options A-C below, we have to decide which one is the best. In my opinion, that is option A, documenting authors’ affiliations at the time when the described resource is published. IMO it should be possible to describe organizations either by name only (as written in the publication or known by the metadata provider), or by both the name and the identifier, eg. ISNI, ROR or RAiD.

The problem with A is that it doesn't play well with persistent identifiers. In the case of the example paper, it states affiliations for Tom, Stuart and Shigeo as if they were universally true, even though they were only true at or around the time the paper was published. This works OK-ish as long as these resources are identified using ephemeral identifiers that are never reused elsewhere (I used e.g. http://example.org/tom above) but if instead we would use an ISNI or ORCID to identify the author, the information becomes invalid. Option C corrects this problem, by adding a level of indirection in the form of the Authorship class; the Authorship resource is specific to the paper.

juhahakala commented 2 years ago

In MARC formats, the problem you describe is avoided because authors' affiliation information is provided in MARC authority records (tag 373 Associated group, https://www.loc.gov/marc/authority/ad373.html) instead of bibliographic records. MARC bibliographic format does not have this tag. Since Dublin Core does not have authority format, we can either ignore the problem created by the ephemeral nature of affiliation information, or try to solve it one way or another. As far as I can tell, your solution works technically, and if it does make cataloguing and DC metadata utilization more difficult, then only marginally.

osma commented 2 years ago

Here's a new proposal for today's meeting. This is based on Option C, but with some adjustments in terminology.

Class AffiliatedAgent

Property agent

Property institution

Example usage (same as Option C above, just replaced Authorship with AffiliatedAgent):

@prefix ex:   <http://example.org/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix dct:  <http://purl.org/dc/terms/>.

ex:dc_process_paper
  dct:creator ex:shigeo_author, ex:tom_author, ex:stuart_author .

ex:shigeo_author a dct:AffiliatedAgent ;
  dct:agent ex:shigeo ;
  dct:affiliation ex:ilis .

ex:shigeo
  foaf:name "Shigeo Sugimoto" .

ex:ilis
  foaf:name "Institute of Library and Information Science, University of Tsukuba, Japan" .

ex:tom_author a dct:AffiliatedAgent ;
  dct:agent ex:tom ;
  dct:institution ex:isb .

ex:tom
  foaf:name "Thomas Baker" .

ex:isb
  foaf:name "Institutszentrum Schloss Birlinghoven, Fraunhofer-Gesellschaft, Germany" .

ex:stuart_author a dct:AffiliatedAgent ;
  dct:agent ex:stuart ;
  dct:affiliation ex:dcmi, ex:oclc .

ex:stuart
  foaf:name "Stuart L. Weibel" .

ex:dcmi
  foaf:name "Dublin Core Metadata Initiative" .

ex:oclc
  foaf:name "OCLC Office of Research" .
osma commented 2 years ago

I've created an example of affiliation information in XML, based on the same 2002 paper example. For discussion in the next SRAP meeting.

The example relies in part on the idea of using XML attributes to express PIDS in DC, as discussed in the PIDS in DC session in Porto 2018. However, since the element name id is problematic, I've used pid instead.

Jashton123 commented 2 years ago

Hi Osma, Thank you for the XML example which seems to work well.

I’m afraid I am on leave for the meeting so apologies,

Jan

Sent from my iPhone

On 14 Apr 2022, at 18:55, Osma Suominen @.***> wrote:

 I've created an example of affiliation information in XML, based on the same 2002 paper example. For discussion in the next SRAP meeting.

The example relies in part on the idea of using XML attributes to express PIDS in DC, as discussed in the PIDS in DC session in Porto 2018. However, since the element name id is problematic, I've used pid instead.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

osma commented 2 years ago

Next step is to document the new class and properties in a Markdown snippet similar to date_subproperties.md. I can do that.

osma commented 2 years ago

Alasdair will call a meeting on the PIDS in DC group, with the goal of finishing and publishing a specification how to express PIDs.

osma commented 2 years ago

Still no Markdown snippet, but I'm working on it...meanwhile, here's an updated SRAP domain model diagram with the new AffiliatedAgent class:

SRAP domain model(3)

osma commented 2 months ago

We now have a means to provide affiliations in the Person shape, so closing this issue.