Open paulmillar opened 7 months ago
To provide more precise pointers:
scicat-backend-next
, towards the end of published-data.controller.ts
.
oai-provider-service
repo, in openaire-mapper.ts
As an example of inconsistencies: | concept | published-data.controller.ts | openaire-mapper.ts |
---|---|---|---|
description[@type=Abstract] |
abstract | dataDescription | |
creator/{givenName, familyName} |
split on first space in name; first term is givenName, rest is familyName | These fields are not provided. | |
creator/creatorName |
familyName, givenName (e.g., Millar, Paul ) |
creator (e.g., Paul Millar ) |
|
resourceType[@Dataset] |
resourceType | This field is not provided. |
Add ability to describe published data according to standard schema
Summary
SciCat has the concept of published data; that is, a set of one or more datasets that, collectively, are described by certain metadata fields. This metadata description is stored as a MongoDB document with the class
PublishedData
.The backend has the ability to map this information to DataCite's XML schema, but only does this when making DataCite API requests for DOI activity. This ability to map PublishedData to a corresponding DataCite XML description isn't exposed by a SciCat API.
Perhaps because of this lack of exposing the DataCite description, the
oai-provider-service
reimplements the same mapping functionality (albeit not completely consistently). OAI-PMH also provides a Dublin Core description (as require by the OAI-PMH specification), which might also be useful under different circumstances.Steps to Reproduce
When minting a DOI, SciCat backend generates XML that conforms to DataCite XML schema. OAI-PMH does the same, when querying the OpenAIRE (
/openaire/oai
) OAI-PMH endpoint.Current Behaviour
Any client that wishes to generate a standards-compliant description of a
PublishedData
document needs to implement the mapping itself. This implies duplication of effort.Should the
PublishedData
class be extended, so additional metadata is recorded (e.g., ORCIDs) and that additional metadata can be included in some standard metadata description (e.g., DataCite) then all service that generate that metadata description would need to be updated (e.g., DOI minting, OAI-PMH).Expected Behaviour
The PublishedData API endpoint is extended to support querying for a description of a specific PublishedData document. This API extension would likely take two arguments: the metadata standard (e.g., Dublin Core, DataCite, Schema.org, ...) and the serialisation. In some cases, only one serialisation makes sense (e.g., DataCite and XML), but in other cases there may be multiple possible serialisations (e.g., Schema.org as JSON-LD,Turtle, RDF/XML, N3 ...).
The backend DOI minting activity would take advantage of this ability (although it might not issue HTTP requests) when generating the XML metadata for DataCite. The OAI-PMH interface could talk with the backend, rather than querying the MongoDB directly. The landing page could take advantage of this when including a Schema.org/JSON-LD description of the published data.
These would be natural places where this new API could be used (there may be others). I suggest this issue is closed when the extended PublicData API is available; ancillary issues should be opened against other SciCat components to track progress in adopting the new API (as appropriate).
Extra Details
This issue is the result of discussion on issue #1175. Some of the comments there are useful for this issue.