IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
879 stars 486 forks source link

Handle.net PIDs - identifier globally visible even for unpublished datasets #8881

Open vkush opened 2 years ago

vkush commented 2 years ago

Dear Dataverse team, after testing of Handles I have found an issue, that every new Handle will be stored as globally visible (with a permission "public read"), that you can find it right away on the http://hdl.handle.net/ server, even if the corresponded dataset is not published yet (is only a draft).

Complete description of the issue and proposal for the solution (thanks Jim!) you can find in the user group: https://groups.google.com/g/dataverse-community/c/wzoRh-LAgTc

As discussed there in the user group, it is not a real problem and this issue can be ignored. But maybe it would be better to have the same behaviour for Handles as we have it for DOIs - new Handles for the draft dataset (and files inside the datasets) should be not visible for everyone (permission "1100" - "admin read = 1", "admin write = 1", "public read = 0", "public write = 0" [see "permission set" in the Create Handle Batch format]), and only when Dataset is published these Handles have to change the status to globally visible ("public read = 1", permission "1110"). Otherwise we can have a situation (it is only an idea, without proofs), that visible and not yet published Handles can be indexed by the search engines, and if the draft dataset is deleted (together with the related Handles), then the search engines will link to the not existed Handle, what can be controversial to the idea of a PID.

Tested Dataverse version - 5.10.1

Many thanks Vladimir

pdurbin commented 2 years ago

@vkush thanks. A quick thought on this... before changing the behavior we might want to check with at least a couple Dataverse installations that use Handle rather than DOI.

The ones that definitely use DOIs have the field "doi_authority" populated at https://iqss.github.io/dataverse-installations/data/data.json with "10.something".

Here are some Handle installations:

If you'd like, please feel free to reach out to them ("contact_email" in the JSON) to see if others using Handles support this change.

vkush commented 2 years ago

Philip, many thanks!

I have parsed your list:

wget <your-link-above-data-json>
jq -r '.installations | .[] | select(.doi_authority != null) | select(.doi_authority | startswith("10.") | not) | .hostname' data.json | less

and I will contact the repositories.

@repositories: I will tick the corresponded task, if you tell me, that you are agree with the implementation. Or you can just write some words bellow, is it related for you or not. I don't have a lot of experience with Handles, that's why could be, that such an issue is just not related for the community.

Let's see ;)

plesubc commented 2 years ago

From UBC's perspective, this isn't really a big issue. If it's a new study that's yet unpublished, the odds of a random user having the handle is small. And even if they do, they'll be presented with the login page to the Dataverse installation.

We have a study languishing in unpublished purgatory, and it's not findable by Google (and thus with my sample size of 1, all search engines) using either the title or the handle. Yes, I suppose you could laboriously find it on the handle server, but the number of people so inconvenienced by its transitory state is probably approaching zero. Most, probably all of them, are staff who interact administratively with the Dataverse installation, so institutional support is already available.

Having handles have the same behaviour as DOIs would be nice, but I suspect no one would even notice.

Paul

LauraHuisintveld commented 2 years ago

Hi @vkush,

We at DataverseNL have transferred from using handles to DOI's in 2020. (The old handles still resolve.) I agree with the comment above, it is not a big issue, but it would indeed be more correct if the handle would not be findable on the handle server if the dataset is still unpublished.

vkush commented 1 year ago

Many thanks to repositories for your replies!

@pdurbin, all the responses are here, I have not received more. Looks like it is not a critical issue for the community, but also is clear, that changes in the handle behaviour would be nice to have, to have the same behaviour as with DOIs. So, maybe it could be signed as an issue with a not high priority and it could be solved in a package together with other non-critical issues.