gbif / watchdog

Project functioning as a watchful guardian of content in the GBIF network, especially against datasets going offline.
2 stars 4 forks source link

Orphan datasets from Germany #10

Open kbraak opened 7 years ago

kbraak commented 7 years ago

In Germany's list of orphans, all 15 datasets that are owned by Staatliche Naturwissenschaftliche Sammlungen Bayerns are false positives and should not be rescued. These are due to this bug in GBIF's crawling service.

Below is the latest analysis of the remaining orphan datasets conducted by @jholetschek. He is still awaiting replies form several hosts/curators/publishers to understand whether their data can come back online. Based on the results, GBIF will need to perform at least one dataset deletion plus change dataset endpoint URLs in GBIF Registry.

Germany orphaned.xlsx

kbraak commented 7 years ago

Thanks @jholetschek for identifying that 8ea44a78-c6af-11e2-9b88-00145eb45e9a is back online.

jholetschek commented 6 years ago

Dataset https://www.gbif.org/dataset/85c8e444-f762-11e1-a439-00145eb45e9a is back online on a new BioCASe installation with 38.154 occurrences.

kbraak commented 6 years ago

That's great news @jholetschek, thanks. That still leaves 43 candidate orphan datasets in Germnay that GBIFS hasn't been able to re-index in the last 6 months as you can see here https://github.com/gbif/watchdog/wiki/AdoptionPlan

jholetschek commented 6 years ago

This list still contains 5 datasets that are online.

kbraak commented 6 years ago

Thanks @jholetschek

I updated the URLs for the 2 datasets from Friedrich-Alexander University of Erlangen-Nürnberg and triggered a re-crawl for them. I also triggered a re-crawl for the 3 datasets from Georg-August-Universität Göttingen. Hopefully they all finish crawling successfully this time.

jholetschek commented 6 years ago

Thanks a lot, Kyle! Seems all five datasets have been crawled successfully now.

Concerning https://www.gbif.org/dataset/ad0d1a24-e952-11e2-961f-00145eb45e9a: I'll meet with the curator next week and will try to convince him to bring the dataset back online.

jholetschek commented 6 years ago

Dataset https://www.gbif.org/dataset/ad0d1a24-e952-11e2-961f-00145eb45e9a is back online can can be crawled again.