Sunset Merritt Member Node

marisastrong commented 4 years ago

To turn off the Merritt/CDL Member Node: https://merritt.cdlib.org/m/ucd_ice_swap collection is stored on Node 5001 (S3) so we already serve it up from there.

Content:

[x] migrate the content in this collection over to Dryad S3; @elopatin-uc3 determine packaging of content (by year?) this needs to be determined how to package and requires input from the researcher
[ ] Remove the https://merritt.cdlib.org/m/demo_dataone collection
[x] Inform DataONE that we plan to no longer maintain MN but content is still available in Merritt and/or Dryad?

Infrastructure:

[x] Decommission uc3-mrtdataone-[stg,prd].cdlib.org
[ ] remove nagios checks, etc
[ ] document migration from metacat to the generic member node

mreyescdl commented 4 years ago

Archiving of DataONE Merritt MN complete. https://merritt.cdlib.org/m/ark%253A%252F13030%252Fm5z94qw1

mreyescdl commented 4 years ago

Removed all DataONE references from profiles (Stage and Production)

Note: The following collections seem to be specific to DataONE. Are the still needed @elopatin @marisastrong

STAGE oneshare_ark_only oneshare_ark_only.orig oneshare_dataup_content oneshare_dataup_content.orig dataone_dash_content dataone_dcxl_content dataone_demo_content

PROD dataone_dash_content demo_dataone_content oneshare_dataup_content

elopatin-uc3 commented 4 years ago

Thanks for the update @mreyescdl.

Stage:

oneshare_ark: Collection description in the UI is, "DCXL ARK Minter". The only unique user, besides Mark, Terry and I is a "Data Up User". There are no objects in the collection. No longer needed, imo.
oneshare_ark_only: I don't see entries in LDAP or the UI for this in Stage, or it's ".orig" profile. Appears in Prod.
oneshare_dataup_content: This has a dataone_dash_submitter user, but there are no objects in the collection. Also, the ".orig" profile doesn't have entries in LDAP. I doubt this collection is needed either.
dataone_demo_content: The most recent object dates to 2018. No submitter user is associated with the collection in LDAP. No longer needed, imo.
dataone_dash_content: Scott submitted the last couple of objects here to test Datacite updates, back in May 2019. Probably not needed anymore; but maybe check with him.
dataone_dcxl_content: Has no LDAP entries; no longer needed.

Prod: @marisastrong please chime in on these (and Stage). I'm only noting LDAP and object status:

oneshare_ark_only: Description in UI is "ONEShare ARK minter". The DCXL user is on this collection, along with Mark, Terry, Eric. No objects in the collection. Thinking it's no longer needed.
oneshare_dataup_content: There is dataone_dash_submitter associated with this; last submission 2018.
demo_dataone_content: No submitter user in LDAP; No objects
dataone_dash_content: dataone_dash_submitter associated; last update July 31, 2020.

Based on Stage results, I'm comfortable with removing all of these collections except dataone_dash_content (check in with Scott). With the exception of demo_dataone_content, I assume we'll want to hang onto the production collections for a little while.

marisastrong commented 4 years ago

My understanding is that collections with the _content suffix are how collections are referred to within Merritt but to how they are referred is unknown to me. Given that the following

dataone_dash_content
oneshare_dataup_content

have dataone_dash_submitter associated with them and have seen fairly recent activity, I would keep the production and stage collections around until it's understood how we are referring to them.

marisastrong commented 3 years ago

Here are some notes from Matt Jones/DataONE for steps that need to occur before we decommission the Member Node hosted at CDL

As DataONE is concerned about long-term access to data and persistence of published identifiers, we don't delete the old records and associated identifiers, so if people cited or bookmarked a particular identifier, it will continue to resolve. What this means for the repo is that best practice would involve:

1) Make a copy of all existing objects from Merritt to Dryad, maintaining the same identifiers and object checksums 2) for each object, change the authoritativeMN field to point to Dryad -- which gives Dryad admin control over everything 3) if you plan to replace the existing data and metadata with new versions that follow Dryad metadata practices, mark the new data and metadata objects as replacements of the old ones (via the obsoletes field), so that DataONE knows that you have published new versions and will surface only the new versions in search.

At that point, we can mark the old Merritt node as down, and any new requests for that data would point at Dryad.

Dryad itself is not currently being harvested in DataONE after the switch off of DSpace. Still not getting harvesting to work for Dryad using their schema.org entries at Dryad. Dryad is now publishing the schema.org, but DataONE needs to harvest. Dryad also has this same issue with obsoleting the old content correctly as well.

marisastrong commented 3 years ago

Dryad has updated everything for DataONE to start harvesting from us. We need to prioritize on DataONE's end to finalize the work.

marisastrong commented 3 years ago

Met with Daniella and Eric to discuss next steps. Plan to zip up all files in the ICE collection, including a mapping file with ARKs contained the zip file and brief informational text describing the contents and note that all content is still preserved in the Merritt repository This zip file will be deposited into Dryad and issued a DOI.
EZID will update all the ARKs in the collection to now resolve to the Dryad DOI.
Once Dryad MN begins harvesting again, any existing ARKs cited or bookmarked will resolve to the DOI containing all ICE objects.

[x] @marisastrong to inform DataONE of this process
[x] @elopatin-uc3 will process objects in collection to deposit into Dryad
[ ] EZID updates ARKs in the collection to resolve to Dryad DOI - determine what the input for EZID needs to be.
[x] Submit dataset to Dryad
[ ] DataONE to begin harvesting Dryad MN

marisastrong commented 3 years ago

The content for the datasets is served up from the member node itself. So if the member node is taken down, the coordinating node at DataONE would not be able to serve up the content.

Content / objects that will be deposited into Dryad should provide the mechanism to serve up to the coordinating node.

[ ] Should confirm with Dryad team on if/how this will occur.