datacite / lupo

DataCite REST API
https://api.datacite.org
MIT License
12 stars 8 forks source link

Schema 4.5 - Support for indexing and querying new values #1047

Closed codycooperross closed 11 months ago

codycooperross commented 1 year ago

Purpose

This PR adds support for indexing and querying publisher object values in Elasticsearch as well as support for Instrument and StudyRegistration resourceTypeGeneral values in Elasticsearch queries. Adds support for querying organizations with matching ROR publisherIdentifiers via the GraphQL API. Adds a test for 4.5 XML at /dois/validate endpoint.

closes: #1012 #1020 datacite/akita#311 datacite/akita#298

Approach

Adds a new publisher_obj attribute to the ES mapping in the Doi model. The existing publisher attribute in the mapping remains unchanged. If query strings for publisher subproperties are detected (e.g. publisher.publisherIdentifier), a regex gsub replaces publisher with publisher_obj so that the correct attribute is queried. No changes to the query_fields are necessary because the publisher property will continue to exist and be queryable.

The serializer is modified to check for the existence of publisher_obj, which will be present when querying ES but not when retrieving single Doi objects from the database.

Modifies the organization_id attribute to include ROR publisherIdentifiers if available. Organization queries in GraphQL already search this field, so no other modifications are necessary to include DOIs with matching publisherIdentifiers in an organization's list of works. As a consequence, DOIs that are related to DMPs with a matching publisherIdentifier will also appear in an organization's list of works.

Adds Instrument and StudyRegistration to resourceTypeGeneral lists for query faceting and the /resource-types endpoint.

Adds tests for resource-type-id filter for new Instrument and StudyRegistration resourceTypes.

Open Questions and Pre-Merge TODOs

Learning

Types of changes

Reviewer, please remember our guidelines: