Mark orphaned datasets on dataset pages

MattBlissett commented 6 years ago

I'm starting the export of orphan datasets, beginning with those that have never been crawled since the ingestion process was rewritten before 6 November 2013.

Verbatim data will be exported as a GBIF download (kept forever) then adjusted to be in a Darwin Core Archive suitable for import. These archives will be kept on https://orphans.gbif.org/, until they are "adopted" by a node.

The existing Endpoint will be removed, and a record kept in a machine tag in case the publisher enquires in the future. A new HTTP endpoint will be added. A record of the GBIF download used is in another machine tag.

If a node wants to adopt a dataset, the export script can be rerun (it will use the original GBIF download rather than the adjusted archive for re-import) and a suitable structure for import into an IPT can be produced.

This needs to be explained, so I think we need an FAQ entry which can be linked from the dataset page.

We should make minimal changes to the dataset page; just an additional entry under "Endpoints" explaining that the dataset was orphaned and is now hosted by GBIF.

The first dataset is this one; Morten can see the machine tags in the API call: https://www.gbif.org/dataset/857bce66-f762-11e1-a439-00145eb45e9a — https://api.gbif.org/v1/dataset/857bce66-f762-11e1-a439-00145eb45e9a

Assigning Andrea, Morten and Daniel to work out what comms and website edits need to be done.

dnoesgaard commented 6 years ago

First draft ready for comments: https://www.gbif.org/faq?question=what-is-an-orphan-dataset

MortenHofft commented 6 years ago

I can add a short text under the registration section with a link to an FAQ item describing it in details. Something along. This dataset has been adopted by GBIF. [What is an adopted dataset?]

What is the field I should look for @MattBlissett? machineTags with namespace=orphans.gbif.org AND name=orphanStatus AND value=AWAITING_ADOPTION? Or how do i recognise datasets that should have this description?

dnoesgaard commented 6 years ago

Just a note on terminology: let's agree on when to use the terms orphan(ed), rescue(d), adopt(ion). I thought adoption was only when a publisher agrees to take over the dataset, but looking a the Github Wiki, it seems a bit fuzzy. My suggestion would be:

a dataset that we haven't been able to ingest for a long time is considered an orphan
a dataset is rescued when the data is recovered e.g. by export from GBIF
a dataset is adopted when a publisher agrees to re-publish from a reliable endpoint

@ahahn-gbif, @kcopas - comments?

ahahn-gbif commented 6 years ago

+1 @dnoesgaard

MattBlissett commented 6 years ago

Going too far with the analogy, the datasets we export but host ourselves could be considered fostered.

I agree that adoption is only once a publisher takes over. Morten, you can recognize the RESCUED value. We may later use extra values like ORPHANED and ADOPTED, but only RESCUED needs anything for the moment.

I can change the hostname of https://orphans.gbif.org/ if that doesn't fit with the comms around this.

gbif / portal16

Mark orphaned datasets on dataset pages #695