NCEAS / metacatui

MetacatUI: A client-side web interface for DataONE data repositories
https://nceas.github.io/metacatui
Apache License 2.0
41 stars 26 forks source link

Add ability to create seriesId for every dataset #1400

Open laurenwalker opened 4 years ago

laurenwalker commented 4 years ago

There should be a MetacatUI config for creating a seriesId for each metadata doc that is created. I believe ESS-DIVE did this by extending the DataONEObject or EML model, so perhaps that code could be merged in and maintained in the MetacatUI codebase.

mbjones commented 4 years ago

While I think this would be a good option to have in MetacatUI, I don't think we would want it enabled by default in our repositories. In general, for the Arctic Data Center, the KNB, and our hosted repositories, it would be better to not assign series IDs to our data sets. SeriesIDs give users multiple identifiers to cite and doesn't provide the clear guidance that they should be citing the PID (following point 7 Specificity and Verifiability of the Force 11 Data Citation Principles). Is there a specific use case you are trying to enable with this feature?

laurenwalker commented 4 years ago

This came up in a discussion with Rani from OPC and Jeanette, when we were talking about the to-be-developed AccessPolicyView in the EML editor. Here's why:

Both OPC and Arctic Data users may want to make edits to a data package after it has been published, but they want these new edits to remain private until they are done. So a user might click edit on their published dataset, make some changes, and use the new AccessPolicyView to make their new version private. They may make several iterations before they're ready to publish a new version.

When this new version is made public again, the MetadataView will not be able to show the "There is a newer version..." link without a seriesId. This is because the MetadataView traverses the pid chain one-by-one until it finds the latest accessible version. When there are 2+ inaccessible versions in that chain, the client can't find the newest version. Example:

Scenario 1 - Client has no issues following public version chain to find the newest version

A (public) -> obsoletedBy B (public) -> obsoletedBy C (public)

Scenario 2 - Client can find newest version by searching Solr for the version that obsoletes B:

A (public) -> obsoletedBy B (private) -> obsoletedBy C (public)

Scenario 3 - Client cannot find newest version!

A (public) -> obsoletedBy B (private) -> obsoletedBy C (private) -> obsoletedBy D (public)

Scenario 3 can be resolved if there was a seriesId.

This hasn't been a huge issue so far because users haven't been able to make their datasets private in the UI. But as soon as we give them the ability to do so, this will likely be a common issue.

Regarding the citation confusion, we could choose to hide the seriesId in the citation at the top of the page. We could show the seriesId somewhere else, or hide it altogether.

mbjones commented 4 years ago

Thanks for the additional context -- really helpful in understanding the use case. There may be other ways to provide the version chain info that could solve your issue too. The problem is that we are not indexing versions very well right now, so the step-by-step traversal of the version chain is needed. I proposed that we provide much more convenient access to version chain info from any document in the chain here: https://github.com/NCEAS/metacat/issues/1429

laurenwalker commented 4 years ago

Thanks Matt, the API outlined in that Metacat ticket would be very useful and would solve this issue. I'll bring it up at our dev team tomorrow so we can discuss timelines.