datacite / bolognese

Ruby gem and command-line utility for conversion of DOI metadata
MIT License
40 stars 14 forks source link

Pulls content values from certain DataCite XML relatedItem properties with attributes instead of hashes that include content and attributes #150

Closed codycooperross closed 1 year ago

codycooperross commented 1 year ago

Purpose

The DataCite Metadata Schema XSD allows certain relatedItem properties to contain attributes not specified in the schema. When reading these properties, bolognese does not check for attributes, generating hashes with attribute values instead of just content values if attributes are present. For example, <volume xml:lang="en">RR-175</volume> is read as "volume": { "lang": "en", "__content__": "RR-175" },

Downstream, this cause indexing errors in lupo because ElasticSearch does not expect hash values for these properties.

This PR corrects this behavior, rendering only content values for these properties in JSON:

"volume" => "RR-175",

closes: #149

Approach

Added parse_attributes function to the problematic properties. This returns the property's content rather than a hash with the attributes.

Types of changes

Reviewer, please remember our guidelines: