DataONEorg / dataone

DataONE information and general-purpose issue tracking
Apache License 2.0
2 stars 0 forks source link

Re-visit Identifiers.org registration #6

Open amoeba opened 3 years ago

amoeba commented 3 years ago

@mbjones and I were looking at our Identifiers.org registration for the d1id prefix. This is a great thing to have. A couple of questions came up:

Thoughts or other ideas? ping @datadavev @mbjones, all dev team folks

amoeba commented 3 years ago

@mbjones pointed out that our ROR ID is https://ror.org/00hr5y405 so we should probably use that in our Identifiers.org registration

mbjones commented 3 years ago

Here's what I propose we should register as a new resource:

image

datadavev commented 3 years ago

IIRC this registration was created very early in the life of identifiers.org and never utilized beyond a few test cases.

Considering identifiers.org is really a registration of identifier types, I'm not sure it's a good fit for DataONE since we don't have an identifier scheme, but rather run a resolution service.

Do we really want to promote a "d1id" identifier type?

Otherwise, this change makes sense if the overall goal is to eventually deprecate the existing service interfaces.

On Wed, Feb 17, 2021 at 8:47 PM Bryce Mecum notifications@github.com wrote:

@mbjones https://github.com/mbjones and I were looking at our Identifiers.org https://registry.identifiers.org/ registration for the d1id prefix https://registry.identifiers.org/registry/d1id. This is a great thing to have. A couple of questions came up:

Thoughts or other ideas? ping @datadavev https://github.com/datadavev @mbjones https://github.com/mbjones, all dev team folks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataONEorg/dataone/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAETZYPYLFLUCKRBMN6IVSTS7RWTFANCNFSM4XZNJBTQ .

amoeba commented 3 years ago

Hrm, not sure I totally follow you, @datadavev.

since we don't have an identifier scheme, but rather run a resolution service

We established an identifier scheme under https://dataone.org/datasets/ last year. Do we still like that direction? One of the big limitations of how we do identifiers in DataONE is that our bare identifiers are not HTTP-resolveable. The PIRI space fixes that but does mean we have two identifiers for every object.

this change makes sense if the overall goal is to eventually deprecate the existing service interfaces.

I see our identifier scheme as being a mostly separate thing from the service endpoints and I'm not sure how that relates to deprecating existing endpoints. Could you elaborate?

datadavev commented 3 years ago

Is DataONE generating the identifier value or using an identifier provided by the content creator?

My impression of the PIRI service was as an alternative endpoint to the resolve endpoint.

On Wed, Feb 17, 2021 at 9:18 PM Bryce Mecum notifications@github.com wrote:

Hrm, not sure I totally follow you, @datadavev https://github.com/datadavev.

since we don't have an identifier scheme, but rather run a resolution service

We established an identifier scheme under https://dataone.org/datasets/ last year. Do we still like that direction? One of the big limitations of how we do identifiers in DataONE is that our bare identifiers are not HTTP-resolveable. The PIRI space fixes that but does mean we have two identifiers for every object.

this change makes sense if the overall goal is to eventually deprecate the existing service interfaces.

I see our identifier scheme as being a mostly separate thing from the service endpoints and I'm not sure how that relates to deprecating existing endpoints. Could you elaborate?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataONEorg/dataone/issues/6#issuecomment-780993684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAETZYMJFVBUII6P3Q5BLATS7R2GTANCNFSM4XZNJBTQ .

mbjones commented 3 years ago

The reason I am pursuing this is that I think we need a location-independent resolution scheme for DataONE identifiers that don't have another resolution scheme. If dataone:{pid} were to resolve to https://dataone.org/datasets/{pid}, it would provide a nice, compact way for citing non-DOI identifiers that has a stable resolution URL. So I view this as a way to formalize our resolver and tie it into the well-established identifiers.org system of resolvers.

I think the d1id was a strategically poor choice -- its difficult to remember or identify. So I thought that adding another 'Resource' at the dataone prefix was a chance to update our resolution URL as well. But I also think we should update the d1id endpoint URL in the off chance that people have used it somewhere in a publication.

This would also make resolution independent of the partocular service API version that is installed, which would improve resolution stability.

datadavev commented 3 years ago

I agree that d1id was a poor choice for an identifier scheme label, at the time there was encouragement for brevity.

In anycase, I don't doubt the benefit of a DataONE specific scheme. It is necessary for practical recognition of the many identifiers that can't be resolved except through the DataONE service. Just not sure how it's going to work, particularly with existing identifiers in DataONE. Will they all be prefixed with "dataone:"? Perhaps only some? What about a DataONE PID that looks like a DOI but doesn't resolve using the DOI infrastructure? Do we check to see if an identifier resolves according to its apparent scheme and apply a dataone: prefix if not? Perhaps the PIRI service should delegate to known resolvers, or at least advertise their availability for known identifier schemes (the http link headers are good for this)?

On Wed, Feb 17, 2021 at 9:39 PM Matt Jones notifications@github.com wrote:

The reason I am pursuing this is that I think we need a location-independent resolution scheme for DataONE identifiers that don't have another resolution scheme. If dataone:{pid} were to resolve to https://dataone.org/datasets/{pid}, it would provide a nice, compact way for citing non-DOI identifiers that has a stable resolution URL. So I view this as a way to formalize our resolver and tie it into the well-established identifiers.org system of resolvers.

I think the d1id was a strategically poor choice -- its difficult to remember or identify. So I thought that adding another 'Resource' at the dataone prefix was a chance to update our resolution URL as well. But I also think we should update the d1id endpoint URL in the off chance that people have used it somewhere in a publication.

This would also make resolution independent of the partocular service API version that is installed, which would improve resolution stability.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DataONEorg/dataone/issues/6#issuecomment-781002048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAETZYM632CESI4WNC7YYWLS7R4WDANCNFSM4XZNJBTQ .

mbjones commented 3 years ago

For DOI PIDs, we currently link the citation to the DOI resolver, but for everything else we link to the /cn/v2/view service. I was proposing that any PID in dataone would be resolvable as https://identifiers.org/dataone:{pid}, and that our citation display would switch to using that link instead of our view service URI which is currently used for non-DOI PIDs. This does raise the issue of where our resolver redirects to -- for non-DOIs, people are redirected to the view service, which is the DataONE-hosted landing page for the dataset, whereas for DOIs they are redirected to the repository-provided dataset landing page, which is typically not on DataONE. So this is an issue we should discuss for various resolvable identifier types like ARKs and handles in addition to DOIs.

mbjones commented 3 years ago

Oh, and as a side comment: we could switch to linking to the identifiers.org resolver for all PIDs, including DOIs and ARKs, because identifiers.org does know how to resolve those identifier types as well. So our citations might link out to: