geopython / pycsw

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.
https://pycsw.org
MIT License
203 stars 155 forks source link

DataCite output schema #623

Closed cfenell closed 12 months ago

cfenell commented 4 years ago

i all, We (https://www.eiscat.se) plan to provide metadata to B2Find, http://b2find.eudat.eu, so we are looking to set up an OAI-PMH server and I found pycsw.

It would be good to be able to serve both DublinCore and DataCIte, and I found an email thread from 2018 about adding DataCite to the output schemas. Was there any progress on this? If not I would be willing to contribute, thankful for directions where to start. Best regards Carl-Fredrik

tomkralidis commented 4 years ago

Looks like there hasn't been much discussion since that thread. Having said this, are you interested in ingest/loading or output of DataCite (or both)?

Contributions would be most welcome, and we're happy to help you along your development in the pycsw Gitter channel as well.

Thanks!

cfenell commented 4 years ago

Thanks, I have joined the Gitter channel!

are you interested in ingest/loading or output of DataCite (or both)?

We are designing a new metadata schema, in compliance with FAIR principles, for a new research infrastructure https://www.eiscat.se/eiscat3d-information/ which is planned to start at earliest during 2022. This means that there are lots of open issues, for example granularity of data. Harvestable metadata, however, is likely to be defined only for larger data collections rather than individual observations. In this case metadata will be gathered from HDF5 files and databases and inserted in a Postgresql db when a new collection is defined (by us or by a data user). We plan to use the DataCite schema to the extent possible both internally and for harvesting to discovery services (including B2FIND and others) so it's both the above. We also have to represent the ~100 configurable beams of each experiment in geospatial coordinates, so additionally it would be useful to learn more about PostGIS. Best regards, Carl-Fredrik

cfenell commented 3 years ago

There is a plugin that produces a bare minimum of DataCite records at https://github.com/cfenell/pycsw/blob/datacite/pycsw/plugins/outputschemas/datacite.py I plan to extend on it and create a proper fork and pull request when I have tested it. Ultimately I want it to produce full DataCite XML as in https://schema.datacite.org/meta/kernel-4.3/example/datacite-example-full-v4.xml (but I am not sure if it is possible to retrieve all the required information).

pvgenuchten commented 1 year ago

Thanks for the initial work @cfenell, we're interested in this development also.

Some extra resources are:

The following calls can be made with above work:

cfenell commented 1 year ago

Hi, thanks a lot for taking up this project! I am afraid I have not done anything here recently, but had planned to return to it before EISCAT 3D starts producing data during next year. I'll be happy to look at your work and test it, and not least to have a good Python library for writing DataCite metadata.

What we have done so far about DataCite is: Data from our old systems can be browsed through the Schedule:

pvgenuchten commented 12 months ago

@cfenell, this is now ready to be reviewed, welcoming your thoughts (for example about the use of contentUrl). while testing I bumped into #898, interesting this has never been flagged before, maybe people don't use the listidentifiers method often.

kalxas commented 12 months ago

Looks good, thanks for the contribution!