datadryad / dryad-product-roadmap

Repository of issues for Dryad project boards
https://github.com/orgs/datadryad/projects
8 stars 0 forks source link

Merritt Cleanup of bad versions #2147

Closed sfisher closed 1 year ago

sfisher commented 1 year ago

Merritt let us know that some objects didn't correctly get local ids (DOIs) so that certain of these objects were split across objects in Merritt.

# --- NOW CLEANED UP IN MERRITT and version info reset for latest version ---
doi:10.5061/dryad.4qrfj6qdv
identifier_id: 100309

resource_ids & Merritt arks:
207841  https://merritt.cdlib.org/d/ark%3A%2F13030%2Fm52p2cnq <-- submitted on 11/23, big outage day
207854  https://merritt.cdlib.org/d/ark%3A%2F13030%2Fm5452xt1
# --- NOW CLEANED UP
doi:10.5061/dryad.rn8pk0pfm
identifier_id: 98356

resource_ids and Merritt arks:
202020  https://merritt.cdlib.org/d/ark%3A%2F13030%2Fm5c60x75 <-- submitted on 11/23, big outage day
207839  https://merritt.cdlib.org/d/ark%3A%2F13030%2Fm5c60x75 <-- submitted on 11/23, big outage day
208187  https://merritt.cdlib.org/d/ark%3A%2F13030%2Fm5k71vg2
# --- NOW CLEANED UP
doi:10.25349/D9XW4D

Only the last in these series (v2 and v3) respectively have a correct DOI in Merritt. When they went to a new object (ark) in Merritt they are back at version 1 there.

Resetting them to version 1 allows the files present to be downloaded but not all files are present because of the fracturing of objects into 2.

Note, we found more that are missing local ids: https://docs.google.com/spreadsheets/d/1woeqF4Jw8YNr2Cf5129w-6uDSmxrCOJh/edit#gid=164076146


From Terry about some additional ones.

, we introduced a new report highlighting objects missing localids. The following 3 objects were added more than a year ago. I plan to assign the following 3 localids to the corresponding arks: ark:/13030/m5479pg0 ; doi:10.5061/dryad.vq83bk3qv ark:/13030/m5jf08bm ; doi:10.5061/dryad.zpc866t6w ark:/13030/m5xq2hj4 ; doi:10.5061/dryad.ht76hdrcv

Let me know if you see any reason not to do this.

sfisher commented 1 year ago

Hi @jleighherzog

Part of the Merritt database troubles last week meant that about 14 items didn't have some correct information in Merritt after their database restore and the dataset would get split into 2 objects when an additional version was submitted. That new object didn't correctly contain all the files that were deposited in the previous version(s).

They were able to correct the problem without any side effect for most new datasets deposited during the outage time that hadn't had any additional versions created.

However there were three that we had to correct manually. The three are doi:10.5061/dryad.4qrfj6qdv, doi:10.5061/dryad.rn8pk0pfm and doi:10.25349/D9XW4D .

None of these had been published yet and the latest versions of these datasets are now corrected to contain all the correct files and downloads are working for the latest versions. There may be a little bit of weirdness in any earlier versions of this dataset as the files and storage may not match up correctly. It is probably not anything users will notice or be concerned about and it will not be something that the public ever sees since these earlier versions weren't published.

Let me know if you have questions or concerns or if you want to discuss. I can talk with you and/or bring the Merritt team in if you need additional information.

Thanks.

sfisher commented 1 year ago

See also https://github.com/CDLUC3/mrt-doc/issues/1294 which is the Merritt ticket for this cleanup.

jleighherzog commented 1 year ago

@sfisher: Thanks, Scott. One of these datasets is in PPR and the files look acceptable, one has been published, and the other is in AAR status.