gbif / watchdog

Project functioning as a watchful guardian of content in the GBIF network, especially against datasets going offline.
2 stars 4 forks source link

Orphan datasets from Norway #5

Closed kbraak closed 6 years ago

kbraak commented 7 years ago

Results from my own early analysis reveal the following:

  1. The NINA IPT is offline at http://digir.nina.no:8080/ipt It serves 1 orphaned dataset: NINA insect database.

  2. GBIF.no IPT serves 142 datasets, however, some have an old endpoint URL (http://dev.gbif.no/ipt/). To ensure that all datasets use the correct endpoint URL, the IPT administrator just needs to press the update registration button on the GBIF Registration page. More info about how to do this can be found in the IPT user manual here.

  3. There are some false positives in the list, such as http://www.gbif.org/dataset/39a80c99-56c2-482c-beef-fa960a8eec4d. Additionally, http://www.gbif.org/dataset/04930329-93ce-4f80-8d67-423a526a736b is a false positive because GBIF hasn’t crawled these datasets in a long time - a bug in our crawling service.

umeldt commented 7 years ago
  1. Looks like this is a similar issue to the second one - the NINA IPT has been moved to http://data.nina.no:8080/ipt/. I'll get in touch with NINA and ask them to update the registration.

  2. "IPT registration update succeeded!" Do the datasets in question have to be republished as well, or should this take care of it?

umeldt commented 7 years ago

Oops, after a quick second look it looks like the NINA issue is something else - there are two "NINA insect database" datasets, one tied to the IPT at http://data.nina.no:8080/ipt/ and one tied to one at http://digir.nina.no:8080/ipt.

VangNINA commented 7 years ago

On June 25. 2015 I sent an email to helpdesk@gbif.org,informing that all the datasets under http://digir.nina.no:8080/ipt had been migrated to our newest IPT Instance, http://data.nina.no:8080/ipt. We assumed that the datasets would be migrated too, but it seems that the old Insect database got stuck, somehow.

The dataset under http://data.nina.no:8080/ipt/ is the most recent, so it would be nice if the other one were removed.

Kind regards, Roald Vang, NINA

kbraak commented 7 years ago

Thanks @VangNINA and @gbifnorway

Indeed the Old NINA IPT was marked as deleted in the GBIF Registry two years ago, however, the Old NINA insect database still remained. I deleted it now though.

As you are likely aware, when moving a dataset between IPTs it should get migrated instead of being recreated anew. Migration preserves its GBIF key and DOI, and instructions for future reference can be found in this section of the IPT User Manual.

@gbifnorway the update worked. The datasets should all have the correct endpoint URL now. GBIF will recrawl the datasets now. @jlegind and I will also force a recrawl of all datasets that weren't updated in a long time due to a bug in our crawling service. When that's done, I'll update Norway's list of potential orphaned datasets to confirm the list is empty - thanks a lot!

Screenshots of GBIF Registry Console, showing deleted installation with 1 remaining dataset:

screen shot 2017-06-19 at 15 37 59 screen shot 2017-06-19 at 15 39 33

MattBlissett commented 6 years ago

I can confirm that there are no potential orphan datasets from Norway. Thanks!