gbif / watchdog

Project functioning as a watchful guardian of content in the GBIF network, especially against datasets going offline.
2 stars 4 forks source link

Dataset from Nordgen (& metadata to be updated) #33

Open dagendresen opened 3 years ago

dagendresen commented 3 years ago

The dataset from NordGen (https://doi.org/10.15468/3nyx9k) is flagged as "orphaned" -- but remain regularly updated with a BioCASE data provider.

Participant node

https://registry.gbif.org/node/fa6bdac4-51e2-4334-940c-ba6cdf6e1257 https://www.gbif.org/participant/303 Contact points are correct.

Organization/publisher

https://registry.gbif.org/organization/b9c5f740-34d9-11de-baf5-e00d96b185ef https://www.gbif.org/publisher/b9c5f740-34d9-11de-baf5-e00d96b185ef Update contact points to: Kjell-Åke Lundblad kjellake.lundblad@nordgen.org Add generic contact point: info@nordgen.org Remove contact points: Dag Endresen and Martin Forsen (Add more metadata... homepage, logo, etc)

BioCASE installation (looks fine): https://registry.gbif.org/installation/604f70d4-f762-11e1-a439-00145eb45e9a https://www.gbif.org/installation/604f70d4-f762-11e1-a439-00145eb45e9a https://www.nordgen.org/biocase/ (Add contact point? -- Kjell-Åke Lundblad)

Dataset

https://registry.gbif.org/dataset/85a347c0-f762-11e1-a439-00145eb45e9a https://www.gbif.org/dataset/85a347c0-f762-11e1-a439-00145eb45e9a Dataset DOI: https://doi.org/10.15468/3nyx9k

Update contact points to: Kjell-Åke Lundblad kjellake.lundblad@nordgen.org Remove contact points: Dag Endresen and Lars Falk (Add more metadata... homepage, logo, etc)

BioCASE dataset endpoint: https://www.nordgen.org/biocase/dsa_info.cgi?dsa=NGB https://www.nordgen.org/biocase/pywrapper.cgi?dsa=NGB

How to update metadata?

Is it possible to update registry metadata for participant node, publisher organisation, datasetm etc. directly from the BioCASE endpoint?

Is it alternatively possible for Nordgen (Kjell-Åke) GBIF Norway, or GBIF Sweden to get edit access to update the respective metadata for NordGen in the registry?

Homepage: https://www.nordgen.org/en/ Logo URL: https://www.nordgen.org/wp-content/uploads/2020/03/NordGen-Logotype-RGB.svg See also: https://www.nordgen.org/en/about/press-and-media/logo-and-graphic-design/ Language: English Address: P.O. Box 41 City: Alnarp Province: Scania/Skåne Country: Sweden Postal code: SE-230 53 Email: info@nordgen.org Phone: +46 40 536 640 Latitude: 55.65905367460462°N Longitude: 13.084225828623053°E

MattBlissett commented 3 years ago

Hi Dag,

Is it possible to update registry metadata for participant node, publisher organisation, dataset etc. directly from the BioCASE endpoint?

Yes, it is. This dataset fails, however, since the mapping schema offered by BioCASe is supported by us. The schema http://digir.net/schema/conceptual/darwin/2003/1.0 is offered, but our crawler doesn't support retrieving DiGIR/DWC occurrences using BioCASe protocol.

The two supported schemas are http://www.tdwg.org/schemas/abcd/1.2 and http://www.tdwg.org/schemas/abcd/2.06. Is it possible to map the dataset to either of these?

Or, if BioCASe supports it, can a Darwin Core Archive be produced from the Darwin Core mapping? (I know it supports this with ABCD mappings.)

CC @ManonGros.

jholetschek commented 3 years ago

Hi Dag, Matt,

creating a DwC-Archive is only possible from an ABCD archive, which requires an ABCD mapping. I would be able to assist in creating this from the old DwC mapping, if necessary,

I wonder how the dataset got harvested in the first place until three years ago. Apparently the occurrence have been last synced 3 years ago before the datset was moved to an orphaned state...

Cheers, Jörg

MattBlissett commented 3 years ago

I think it must have been crawled with an earlier version of GBIF's systems, perhaps pre-2013, and was broken (not crawling) since around that time.

dagendresen commented 3 years ago

The crawling and ingestion history in the Registry seems to describe weekly (failed) attempts at indexing the data source? With the last successful indexing made on 2018-03-09. https://registry.gbif.org/dataset/85a347c0-f762-11e1-a439-00145eb45e9a/ingestion-history https://registry.gbif.org/dataset/85a347c0-f762-11e1-a439-00145eb45e9a/crawling-history?offset=125

MattBlissett commented 3 years ago

https://api.gbif.org/v1/dataset/85a347c0-f762-11e1-a439-00145eb45e9a/process?limit=25&offset=125

That crawl attempt 1 is from the orphan dataset server, so the conversion to an orphan dataset was the first time the 2013– system saw something valid for it to crawl. All subsequent crawls have been "Not modified", as the orphan dataset archive doesn't change.

jholetschek commented 3 years ago

Kjell-Åke has set up a new BioCASe installation and re-created the mappings and archives. Details in his mail below:

Von: Kjell-Åke Lundblad kjellake.lundblad@nordgen.org Gesendet: Dienstag, 11. Mai 2021 11:10 An: Holetschek, Jörg J.Holetschek@bgbm.org Cc: Anders Telenius anders.telenius@nrm.se; mblissett@gbif.org; Marie Grosjean mgrosjean@gbif.org; Dag Endresen dag.endresen@gmail.com Betreff: Re: [gbif/watchdog] Dataset from Nordgen (& metadata to be updated) (#33)

Thanks Jörg,

I had already tested to click on the link to cancel, but deleting the *.proc-file solved my issue. The archiving is working now. Now we only got one step left, to move it all to a new server. I will inform you when we have moved.

Current "GBIF-node-address" is https://www.nordgen.org/biocase/pywrapper.cgi?dsa=genbisNGB. The new will probably be https://biocase.nordgen.org/pywrapper.cgi?dsa=genbisSWE054.

This due to that we decided that www.nordgen.org should only be the WordPress CMS, and because of we will also host GBIF-client-nodes for other Gene banks like Estonian Crop Research Institute (ECRI), theirs address will probably be something like: https://biocase.nordgen.org/pywrapper.cgi?dsa=genbisEST019

Since I already have worked out a couple of data views, to be able to collect the necessary data from our Grin-Global system. I can then use those data views as templates to set up a node for other Gene banks we host in Genbis (Nordic and Baltic gene banks instance of an adapted Grin-Global system).

It looks like the archiving process is finished.

For now, you should be able to reach us on the address: https://www.nordgen.org/biocase/pywrapper.cgi?dsa=genbisNGB

Best regards

Kjell-Åke Lundblad

ManonGros commented 3 years ago

We updated the endpoint, the dataset is crawling right now.

dagendresen commented 3 years ago

Apropos questions from Kjell-Åke on the ABCD 2.06 mapping. Before ABCD 2.06 was released Walter and Helmut made the mapping between MCPD and ABCD in this document. Maybe useful to understand the intentions?