IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
881 stars 493 forks source link

Support Research Organization Registry (ROR) IDs #6640

Closed mcuthill closed 6 months ago

mcuthill commented 4 years ago

As a data steward for an organization producing and publishing data, we would like to see the Research Organization Registry ID option added to the Citation metadata block. Perhaps as an addition to the list of Identifier Schemes for authors, or attached to the Affiliation, Producer, Distributor, or similar. As can be seen here, a respectable list of supporters and signatories have already committed to the adoption and use of RORs going forward.

pdurbin commented 4 years ago

@mcuthill hi! Two weeks ago I heard all about ROR at this event in Lisbon the day before PIDapalooza 2020: https://www.eventbrite.com/e/the-ror-community-meeting-lisbon-registration-82814758171

It was a fun group! Here's a pic from https://twitter.com/ResearchOrgs/status/1222159655377473539

Screen Shot 2020-02-11 at 9 48 11 PM

Here are my main takeaways from that event:

I don't think ROR IDs make sense in the list of author identifier schemes (ORCID, etc.) (or does it?!?) but yes, ROR IDs could be tied to Affiliation and other fields you mentioned. (Would it make sense to re-title this issue to something like "Support Research Organization Registry (ROR) IDs"?) Off the top of my head I'm not sure how much work this would take.

mcuthill commented 4 years ago

@pdurbin Thanks for sharing all the materials from that workshop! ROR definitely seems to be gaining momentum. You're right that it wouldn't generally fit in the Author identifier category, except in edge cases like ours (Ocean Networks Canada) where the data mostly isn't directly associated with a single PI so the organization serves as the author. It might be good to have it as an option in the Identifier Scheme list for situations like that, but also added to other field/s where organizations are normally identified.

pdurbin commented 4 years ago

@mcuthill sure. As has been discussed extensively in #5029 Dataverse doesn't currently have a way to express the different between a person and an organization in the "Author" fields and subfields, but I see what you mean. If there was a checkbox or something for "organization", perhaps we could prompt for a ROR ID. Something like this (you have to imagine the checkbox)

Screen Shot 2020-02-12 at 5 41 52 PM
mfenner commented 4 years ago

DataCite maybe two years ago added the optional property nameType that can be either "Personal" or "Organizational" for exactly this reason. We also separate out personal names into givenName and familyName fields. These details are important for properly formatting metadata into a citation in one of the many citation styles. We support ROR (or other organizational identifiers) for names that are for organizations.

DataCite uses a set of rules to "guess" whether an author is a person or organization. The most effective seems to be the list of common givenNames that we check against every author name on DOI registration.

pdurbin commented 4 years ago

@mfenner thanks for the reminder about nameType. I see we have tests for it here:

However, these are used for a specific "export" format (OpenAIRE) rather than what Dataverse sends over the wire to DataCite.

As @jggautier has noted at https://github.com/IQSS/dataverse/issues/2917#issuecomment-421468212 and https://github.com/IQSS/dataverse/issues/6492#issuecomment-572251082 we use your rules already in that export format already. Thanks!

philippconzett commented 4 years ago

We are definitely in favor of implementing ROR in Dataverse. In a recent report (https://doi.org/10.29242/report.effectivedatapractices2020), the Association of Research Libraries (ARL) recommends wide adoption of these 5 core PIDs to power findability of research data, including ROR:

image

@mcuthill already mentioned some fields in the Citation Metadata schema where ROR would fit in. Here is my list of relevant fields:

mfenner commented 4 years ago

@philippconzett here is how we currently connect ROR IDs to DOIs at DataCite:

The relative numbers as of today are as follows:

Bildschirmfoto 2020-09-26 um 09 27 11

This is data on all DataCite DOIs and 8 million Crossref DOIs in DataCite Commons. Crossref doesn't yet support ROR IDs in their schema, but we can link ROR ID and DOI via the Crossref Funder ID in funding information.

Affiliation is the classic use case for ROR, in addition we have a small number of DOIs with organizations as creator or contributor. But by far the largest number is hosted, DOIs in a repository run by particular organization identified by its ROR ID. This is of course one big reason why institutional repositories exist. For domain repositories that linkage is also useful, but with a different kind of information. For a repository that hosts content contributed by researchers from many different organizations, linking by affiliation is crucial.

For the 273,601 DataCite DOIs with at least one ROR ID as affiliation identifier, more than 220K are in the "institutional repository" category. Dryad is currently the implementation in the domain repository category with the biggest uptake.

When you look at a particular organization identified by ROR ID in DataCite Commons, e.g. UiT, you see these different sources aggregated in one place, e.g. Dryad datasets and publications from Crossref with funding: https://commons.datacite.org/ror.org/00wge5k78

Not yet all DOIs from DataverseNO, as this needs the new DataCite consortium organization structure to be in place to uniquely associate the repository with UiT. An organization where this transition has already happened is for example the University of Cambridge: https://commons.datacite.org/ror.org/013meh722

philippconzett commented 4 years ago

Thanks, @mfenner! This was useful information. And I guess the last section answers the question which I have had on my to-do-list since August 27, namely "Why are there only 64 records [as of 2020-08-27] for UiT in the DataCite Commons overview?" So, once the DataCite consortium organization structure is in place, the numbers for UiT will be more correct. But will these numbers be based on the fact the UiT is running DataverseNO? In that case, will all the datasets published by other partner institutions of DataverseNO, e.g. NTNU (https://ror.org/05xg72x27), UiB (https://ror.org/03zga2b32) etc., also be associated with UiT? In terms of the Dataverse metadata schema, I think the correct association would be through the metadata field producer, ideally through ROR.

mfenner commented 4 years ago

@philippconzett Mapping ROR ID and DOI via the repository as a "shortcut" only works reliably if it is an "institutional repository. It multiple institutions are behind a repository as I can see for DataverseNO for example at https://www.re3data.org/repository/r3d100012538, things get more complicated. The safest way is of course to add the ROR ID to every single DOI, but I would suggest to think about how this can also be done at the repository level in Dataverse, for example by defining "collections" for each repository partner institution.

The "contributed" group in my visualizations above includes contributors with a ROR ID as nameIdentifier, and if you use that for example with contributorTypes "producer", it would work with DataCite Commons today without additional work needed on our end. You can see this for the California Digital Library in this query (where they use contributor type "producer" for data management plans, some very recent work where DataCite helped): https://commons.datacite.org/ror.org/03yrm5c26?query=contributors.contributorType%3AProducer

philippconzett commented 4 years ago

@mfenner Once ROR support is in place in Dataverse, we will add RORs to each dataset. We would simply add these RORs in the dataset/metadata templates for each partner institution. The ROR will then automatically be included in the Producer field (and if necessary other fields, e.g. Author Affiliation) of each published dataset.

You suggest we also should consider "defining "collections" for each repository partner institution". Each DataverseNO partner institution has already its own institutional collection (= sub-dataverse), e.g. UiB: https://dataverse.no/dataverse/uib. But currently, such collections do not get their own DOI in the Dataverse software. However, at request from a research group, DataverseNO has recently minted a collection DOI (through DataCite Fabrica) for a sub-sub-collection; see https://doi.org/10.18710/AJ4S-X394. Would minting such collection DOIs be helpful to associate datasets with organizations in DataCite Commons?

mfenner commented 4 years ago

If ROR IDs can be automatically included in the producer field, then maybe using collections is not needed. For repositories with content from multiple organizations, using ROR IDs per DOI is probably the "safest" way to associate content with an organization.

Something that would help then, and we have heard this in other contexts, is the ability to "bulk update", so that this information can also be added retroactively without too much troiuble.

philippconzett commented 3 years ago

This blog post may be of interest for the discussion in this issue thread: https://www.pidforum.org/t/organizational-identifier-adoption-in-datacite-metadata/1279.

philippconzett commented 3 years ago

I just noticed that support for PIDs for institutions is set out as a desired characteristics in the COAR Community Framework for Good Practices in Repositories (https://doi.org/10.5281/zenodo.4110829); cf.:

1.9 The repository supports PIDs for authors,funders, funding programmes and grants,institutions, and other relevant entities.

doigl commented 3 years ago

Just to support the issue: we (University of Stuttgart) would also be very interested to have ROR-Ids integrated with all the affiliation fields (Author, Contact, Producer, Distributor), ideally in form of an external controlled vocabulary as the backend of a auto-fill-field with the label of the information visible for humans and the ROR-ID somewhere in behind and added to the DataCite-Metadata for getting a DOI.

In our repository, we have several datasets with authors from different organizations, so it would really be good, if the ROR could be attached not only at the dataset-level, but on the author-affiliation-level. And we still need to attach an ORCID to the author, so it should really be an identification of th affiliation of an author/contact and not an identification of the author itself.

lmaylein commented 3 years ago

Just to support the issue: we (University of Stuttgart) would also be very interested to have ROR-Ids integrated with all the affiliation fields (Author, Contact, Producer, Distributor), ideally in form of an external controlled vocabulary as the backend of a auto-fill-field with the label of the information visible for humans and the ROR-ID somewhere in behind and added to the DataCite-Metadata for getting a DOI.

Heidelberg University would also appreciate this.

stevenmce commented 3 years ago

And +1 for ADA as well - we are looking at RORs for our CADRE project (https://cadre5safes.org.au/)

pdurbin commented 2 years ago

With help from @Kris-LIBIS and the code in https://github.com/gdcc/dataverse-external-vocab-support/pull/9 I was just able to search for "ucla" under Author Affiliation and see a list of organizations in ROR to select from. Here's a screenshot:

Screen Shot 2022-02-10 at 3 40 34 PM

pdurbin commented 2 years ago

@landreev recently configured https://demo.dataverse.org with the same external controlled vocabulary example: Author Affiliation can be populated from ROR.

He put some nice screenshots at https://github.com/IQSS/dataverse/issues/8571#issuecomment-1118058256

philippconzett commented 2 years ago

Great! Just tested it. Works fine. Would it make sense to expand the search configuration to include non-initial positions, so that when searching, e.g., for "California" you also would get results where "California" is in the midle or the end of the name, e.g., "University of California", "University of California, Berkeley"?

Kris-LIBIS commented 2 years ago

@philippconzett That depends on the search API of ROR. But as far as I can tell from the docs and the screenshot above, that should already work.

Please note that this has been a quick proof of concept implementation. The ROR search API only returns the first 20 results. In order to retrieve more, support for pagination should be added. Then again, you can narrow your search by entering multiple words like "berk calif".

For instance: we use pagination in our author lookup: image

philippconzett commented 2 years ago

Thanks, @Kris-LIBIS! It seems that the pagination configuration was the reason why I didn't see relevant results when searching for, e.g., "California". I guess pagination would be a configurable feature?

Kris-LIBIS commented 2 years ago

No, it is not configurable, support for more pages is just not there because I wanted to keep it simple. If you want, you can add support for it in the javascript code.

I think you may have the wrong idea about this feature. It is not part of Dataverse, but something you must add yourselves. It is part of an unmerged PR that adds an example of a light-weight implementation of external vocabularies configuration.

I call it light weight because it does not store IDs or URLs for an external vocabulary service as with the SKOSMOS examples. Instead, a lookup is performed in the browser and the data is copied and pasted into the metadata fields. There is no link to the external vocabulary maintained in Dataverse. It is a fill-in aid, nothing more. We created it because it does not require a change to the metadata block or Dataverse and the vocabulary does not have to worry about link rot and historical data.

philippconzett commented 2 years ago

Thanks for clarifying! We'll have a closer look at it once we've migrated our instance to the cloud and upgraded to 5.10+

mreekie commented 1 year ago

Priority:

cmbz commented 1 year ago

Update:

jggautier commented 1 year ago

I added this in the broader related issue at https://github.com/IQSS/dataverse-pm/issues/19 and realized I should also mention here that in a Google Slide at https://docs.google.com/presentation/d/1PtqmEzAamuM2__V8psOIetgNODPQxjqSEOuxL3kAV-Y I've tried to summarize what support means and which types of metadata are and aren't supported in some way. I'm hoping this helps scope the work.

sbarbosadataverse commented 1 year ago

Most recent update to this issue:

NIH Task 2.5.3* | Task 2.5.3: Participate in GREI ROR Working Group and define and scope Dataverse ROR support (New for Year 2) | Proposed: Membership in ROR WG and document and related issues (e.g., https://github.com/IQSS/dataverse/issues/6640) describing how Dataverse will support ROR and technical work needed to provide this support

cmbz commented 1 year ago

Updated AIM labels to reflect relationship to Aim 2.5.3 rather than 1.5.1 and 1.5.2

amandafrench commented 1 year ago

Amanda French, Technical Community Manager for ROR, here. Just a note that I'm available to answer any questions you might have as you integrate ROR. And regarding the discussion from 2020 about individuals vs. institutions as authors, you might take a look at the slides at https://doi.org/10.5281/zenodo.8074996 where @zzacharo showed how InvenioRDM handles that in the interface.

pdurbin commented 1 year ago

Thanks @amandafrench! Much appreciated! 🎉❤️

jggautier commented 6 months ago

We'll be working on this as part of NIH-GREI funded work. @cmbz and I agreed to list this issue in https://github.com/IQSS/dataverse-pm/issues/127, where related issues are listed, and close this GitHub issue.