cioos-siooc / ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.
http://ckan.org/
Other
2 stars 4 forks source link

Harvesting metadata from DataCite #178

Open timvdstap opened 1 year ago

timvdstap commented 1 year ago

Hoping to work on a harvesting metadata records from DataCite into the Hakai Catalogue. We'll have to identify 'Hakai' datasets (e.g. through the ROR) and see how the metadata fields are mapped over into the Hakai Catalogue. As an example we might be able to use this record.

timvdstap commented 1 year ago

@fostermh

fostermh commented 1 year ago

related datacite links

similar zenodo related datacite api link

fostermh commented 3 weeks ago

How often do we expect to use this? How often could it have been used to date?

options:

timvdstap commented 3 weeks ago

Just to add to this, our current recommended approach (but not documented best practice) to preserving data long-term on a general repository is:

if scientific journal requires data to be on e.g. DRYAD, host data there and have DRYAD mint a DOI. Make this data more visible by creating a metadata record for the Hakai Catalogue, linking to the data in DRYAD and adding DRYAD DOI in the DOI field, as otherwise there'd be multiple DOIs for the same resource (trying to avoid this where we can). For time-series data, e.g. where more versions are expected and there is a need for version-specific DOIs, the DRYAD DOI is linked either as resource (if primary resource) or as related work. In this case we can mint a DOI through the form to represent the overall collection (see e.g. JSP record). Advantages: increase visibility through the catalogue, no minting of multiple DOIs for the same exact resource. Drawbacks: more effort for data providers, Hakai does not 'control' DOI.

where e.g. Zenodo is required, we recommend using the GitHub - Zenodo integration. Zenodo provides a version-specific DOI upon release of a repository. Through this approach as well data providers will still have to create a metadata record for the Hakai catalogue using the form.

timvdstap commented 3 weeks ago

So following our conversation, most suitable option appears to be for data providers to add a DOI from a different repository (e.g. DRYAD, Zenodo) into the metadata form. The metadata form would then extract elements of the metadata and populate specific fields in the metadata form. The exact 'how' of this is not yet known. Data providers would still have to manually go through the form to modify/update where needed, as well as mint a DOI for the record in the Hakai Catalogue.

Ideally, the record in the Hakai Catalogue points to the data package hosted in the Hakai GitHub organizational account, as the primary resource. This allows the Hakai Institute to maintain ownership of the data. This overall data package can be in a private repository, although ideally it is (becomes) publicly accessible eventually. The data published on a different repository would then be included as a Related Work, even if this data is an exact copy of the data published elsewhere and referenced through a DOI. This relationship ('Is Identical To') can be provided.