gbif / watchdog

Project functioning as a watchful guardian of content in the GBIF network, especially against datasets going offline.
2 stars 4 forks source link

Orphan datasets from Belgium #4

Open DimEvil opened 7 years ago

DimEvil commented 7 years ago

@kbraak

InboVeg - NICHE-Vlaanderen groundwater related vegetation relevés for Flanders, Belgium 3d1231e8-2554-45e6-b354-e590c56ce9a8 Zomerganzen - Summering geese management and population counts in Flanders, Belgium 2b2bf993-fc91-4d29-ae0b-9940b97e3232

identified as orphan datasets? http://data.inbo.be/ipt/resource?r=zomerganzen-events http://data.inbo.be/ipt/resource?r=inboveg-niche-vlaanderen-events

kbraak commented 7 years ago

Thank you @DimEvil, these are two false positives caused by an unexplained error in GBIF's crawling service: GBIF hasn't tried to recrawl these two datasets since October 2016. While we investigate this error, you can safely ignore these datasets and continue to review the remainder of potential orphans in Belgium's list. Thanks

DimEvil commented 7 years ago

@kbraak This dataset: Waterbirds of the Botanic Garden Meise c0cc29de-f49f-4b66-b4ec-c83afbb7101d can also be removed from the Orphan list. As long as Meise did not succeed in installing the newest version of IPT they are turning IPT ON/OFF when they want to publish. (due to the security breach, some months ago)

kbraak commented 7 years ago

Thanks @DimEvil, I can see the Botanical Garden Meise IPT hosts 3 datasets. They need to remain permanently accessible online. Perhaps they would be interested in moving their datasets to a trusted data hosting centre in Belgium?

DimEvil commented 7 years ago

Hi @kbraak , I will check this with Meise. they normally should have an IPT permanently on line. The fact that they turn IPT on/off is because of the security issue earlier and the lack of time in updating the IPT.

kbraak commented 7 years ago

Thank you @niconoe for the following analysis:

Dataset Status
3d1231e8-2554-45e6-b354-e590c56ce9a8 Appears back on line (endpoint works as of July 14th)
2b2bf993-fc91-4d29-ae0b-9940b97e3232 Appears back on line (endpoint works as of July 14th)
c0cc29de-f49f-4b66-b4ec-c83afbb7101d Temporary technical issue, provider needed to update IPT, Java, and update the URL then it will be made online again.
f58465c4-27ff-11e2-85e3-00145eb45e9a False positive: Was republished in 2012 on an IPT (http://www.gbif.org/dataset/b76c1a65-b912-4ca6-be7e-50eb365f4a32)
f5499142-27ff-11e2-85e3-00145eb45e9a False positive: Was republished in 2012 on an IPT: http://www.gbif.org/dataset/5bba3c0c-4cfe-4e9c-a744-25eeb5adf2fe
860fc602-f762-11e1-a439-00145eb45e9a According to the dataset page, it seems there never was any data nor (meaningful) metadata in this dataset. This looks like something that was published by mistake. Still contacted the guy, he’s on holidays for now.
85e8c69c-f762-11e1-a439-00145eb45e9a Weird: the Belgian orphan list shows no URL endpoint for this dataset, while by browsing GBIF pages I can find the following BioCASE installation that seems to accept requests, at first look. Bug in orphan search code?
82f258ae-f762-11e1-a439-00145eb45e9a Metafro-infosys-prelude.. Seems to have disappeared, but was on our DIGIR provider, so we should have moved it to the IPT. To be investigated, we can in all cases adopt it (again)!
82f603dc-f762-11e1-a439-00145eb45e9a We were previously hosting datasets on the behalf of those institution (BCCM), and they initially planned to take over their responsibilities and host the datasets themselves. As far as I know it never happened, and our privilegied contact doesn’t work there anymore. We therefore need to contact them and ask if they still want to provide data, and host it themselves. If positive, it may also be good to suggest updating the (very old) data. @andrejjh, do you agree with this approach?
82f73af4-f762-11e1-a439-00145eb45e9a We were previously hosting datasets on the behalf of those institution (BCCM), and they initially planned to take over their responsabilities and host the datasets themselves. As far as I know it never happened, and our privilegied contact doesn’t work there anymore. We therefore need to contact them and ask if they still want to provide data, and host it themselves. If positive, it may also be good to suggest updating the (very old) data. @andrejjh, do you agree with this approach?

Please confirm:

By the way, it turns out GBIF has never crawled 85e8c69c-f762-11e1-a439-00145eb45e9a. I have just triggered a crawl manually.. let's see what happens ;)

niconoe commented 7 years ago

Also:

niconoe commented 7 years ago

We just had confirmation that 82f258ae-f762-11e1-a439-00145eb45e9a should also been deleted: it has been replaced by 49c5b4ac-e3bf-401b-94b1-c94a2ad5c8d6 (as a checklist, that was a better match for the data.

So IHMO, the finally remaining tasks are:

Sounds good for everyone?

kbraak commented 7 years ago

Thanks @andrejjh and @niconoe for your follow ups.

I confirm that following datasets have been flagged as deleted in the GBIF Registry: