EDIorg / data-package-best-practices

Best Practices for data packages. a gh-pages website, with sections for metadata concepts and aspects of data packaging
https://ediorg.github.io/data-package-best-practices/
14 stars 6 forks source link

identifier for organizations #71

Open mbjones opened 3 years ago

mbjones commented 3 years ago

In section 7, the statement is made:

ORCID identifiers are not yet available for organizations, so <address>, <phone>, and <onlineURL> elements should be included for them. In the examples, these elements are included for completeness.

At this point, there are several good identifier systems for organizations. I would recommend that people use either ROR (https://ror.org), GRID (https://grid.ac), or ISNI identifiers for organizations. Here's the ROR for NCEAS for example: https://ror.org/0146z4r19

I propose the recommended structure for this would be to include the ROR in the userId field (which is admittedly poorly named) in an analogous way to an ORCID when an organization is listed. Here's a revised version of your example with a ROR for an NCEAS creator:

<creator id="https://ror.org/0146z4r19" system="https://ror.org" scope="system">
   <organizationName>National Center for Ecological Analysis and Synthesis</organizationName>
   <electronicMailAddress>info@nceas.ucsb.edu</electronicMailAddress>
   <onlineUrl>http://www.nceas.ucsb.edu/</onlineUrl>
   <userId directory="https://ror.org">https://ror.org/0146z4r19</userId>
</creator>

We are starting to incorporate these where posisble into our records to assist with disambiguating organizational references.

Update: edited example to correct attribute name as pointed out by @twhiteaker below.

atn38 commented 3 years ago

Hi Matt, late last year we were wondering the same and reached out to ROR to obtain an entry for BLE LTER. ROR seemed like the most appropriate available PID provider for something like a LTER site, although I had my doubts, since bigger orgs like NCEAS seemed to be more their thing.

We have been corresponding back-and-forth with Maria Gould at ROR with EML examples and links to explain what we are trying to do, so perhaps they haven't seen this particular use case. Our proposed usage in EML is similar to yours.

(P/S: We haven't heard back from them in a couple months though. How did the process go for NCEAS?)

cgries commented 3 years ago

HI Matt and An, I did not contact them, but last time I looked they seemed to be working on IDs for universities and departments, but hadn't reached the Center for Limnology within UW yet. Neither did they seem to support field stations, much less LTER projects that may encompass many university departments, field stations and various funders. I think this will take a little more sorting out before we can recommend anything.

mbjones commented 3 years ago

Yeah, I totally agree it can be hard to obtain a ROR, and we hit the same barriers for NCEAS. They are still working out their processes for that. On the other hand, it is relatively easy to obtain a GRID, through a simple request on their site. You simply have to provide a rationale as to why the organization has a role in the scholarly research ecosystem for attribution purposes. ROR imports all GRID entries, and so you will automatically be included in ROR once you get a GRID. We also went to the extra step of creating a wikidata ID as well.

Regardless of which of these types are used, I think its tremendously useful and a best practice to provide some organizational identifier in the userId field for organizations if they exist. We struggle with disambiguiating organization strings across the DataONE network holdings, and this would help soooo much. We've done some work on trying to harmonize organization strings across all DataONE repos, and I've thought it might be interesting to see if ROR would add all of these in a batch operation, as they all meet their core use cases of scholarly attribution.

marty-downs commented 3 years ago

I definitely would support getting GRIDS for all LTER programs and would be happy to reach out and see if we can batch create them.

mbjones commented 3 years ago

GRID has a ticket system for making requests. For reference, here's what I provided for NCEAS, and shortly thereafter they created the GRID without any followup. I did take the time to ensure there was a Wikidata entry, and I looked up our ISNI which also already existed, making it more of a cross-reference:

On Wed, 4 Dec at 5:32 AM , Matthew Jones jones@nceas.ucsb.edu wrote: Please add NCEAS to GRID, as it is affiliated with decades of research publications, and needs an identifier for citations.

Name: National Center for Ecological Analysis and Synthesis Acronym: NCEAS type: nonprofit research institute inception: 1995 Wikidata: Q6971323 ISNI: 0000000121907056 NAAN: 85063 URL: https://nceas.ucsb.edu Administered by: University of California Santa Barbara

twhiteaker commented 3 years ago

@marty-downs I would appreciate you reaching out to get some GRIDs for LTER sites! :)

marty-downs commented 3 years ago

Inquiring. I'll let you know what I hear.

mbjones commented 3 years ago

One other related tidbit. Google Dataset Search pays attention to the schema.org entry for data catalogs, so another thing that we have been doing is to be sure that our schema.org entry for our repository follows the science-on-schema.org guidelines on providing information about the repository. This includes the ROR/GRID/Wikidata identifiers for our repository. Here's the snippet from our repo SO entry that is relevant:

{
    "@context": {
        "@vocab": "https://schema.org/"
    },
    "@type": ["Service", "Organization", "ResearchProject"],
    "@id": "https://arcticdata.io",
    "identifier": [
        {
            "@type": "PropertyValue",
            "name": "Re3data DOI: 10.17616/R37P98",
            "propertyID": "https://registry.identifiers.org/registry/doi",
            "value": "doi:10.17616/R37P98",
            "url": "https://doi.org/10.17616/R37P98"
    },
        {
            "@type": "PropertyValue",
            "name": "wikidata:Q77285095",
            "propertyID": "https://registry.identifiers.org/registry/wikidata",
            "value": "wikidata:Q77285095",
            "url": "https://www.wikidata.org/wiki/Q77285095"
    },
        {
            "@type": "PropertyValue",
            "name": "grid:grid.507882.0",
            "propertyID": "https://registry.identifiers.org/registry/grid",
            "value": "grid:grid.507882.0",
            "url": "https://www.grid.ac/institutes/grid.507882.0"
    }
    ],
    "name": "Arctic Data Center",
    "legalName": "Arctic Data Center",
    "logo": "https://arcticdata.io/wp-content/themes/aurora/library/images/logo_.png",
    "url": "https://arcticdata.io",
    "description": "The Arctic Data Center is the primary data and software repository for the Arctic section of NSF Polar Programs.",
    "sameAs": [
        "https://ror.org/055hrh286",
        "https://www.grid.ac/institutes/grid.507882.0",
        "https://www.wikidata.org/wiki/Q77285095",
        "https://www.re3data.org/repository/r3d100011973",
        "http://doi.org/10.17616/R37P98",
        "urn:node:ARCTIC"
    ]
}

I couldn't find an identifiers.org registry entry for the ROR registry for the propertyId field, so that one is just listed in sameAs. I think doing this will help create crosss-linkages between EML datasets in their various locations and the repositories and catalogs they are included in.

twhiteaker commented 3 years ago

LTER is pursuing IDs for LTER sites. That's a separate issue from the OP, so I suggest interested parties discuss LTER IDs outside of this Issue.

For the OP, I think if an organization has an ID, it should include it. The best practice could suggest identifiers to use, such as ROR.

Should the attribute of userId be directory instead of system?

As far as providing examples for the best practices goes, would it be OK to leave out id, system, and scope? These are optional attributes and mostly redundant to what's in userId. So the example would be:

<creator>
   <organizationName>National Center for Ecological Analysis and Synthesis</organizationName>
   <electronicMailAddress>info@nceas.ucsb.edu</electronicMailAddress>
   <onlineUrl>http://www.nceas.ucsb.edu/</onlineUrl>
   <userId directory="https://ror.org">https://ror.org/0146z4r19</userId>
</creator>
mbjones commented 3 years ago

Hey @twhiteaker good catch -- the attribute should be directory. I am going to edit my example above as well just so that people don't accidentally copy the erroneous example.

As with the rest of EML, it is fine to leave ouut id, system, and scope, as they are optional. However, I included them to show how the id can be used to point to a globally dereferencable id, and then becomes a good way to repeat use of that org in other elements in the EML document via references. But it is optional.