MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

xml2kvp: convert child nodes to text from target node #392

Open ghukill opened 5 years ago

ghukill commented 5 years ago

While using Combine to analyze ~26k XML files of a relatively unknown structure, got the following from a naive field mapping:

Screen Shot 2019-04-12 at 8 36 43 AM

Unfortunately, this XML contains elements that only serve a presentation function, e.g. <italic>, which don't provide any semantic meaning.

It would be nice if field mappings configurations, xml2kvp, would accept some kind of configuration to ignore child elements of a targeted element. Or, better yet, take all text and child elements of a target node and convert to string.

In this example, it would be beneficial to stop at:

book_body_book-part_body_book-part_book-part-meta_abstract

and produce only raw text for all child text and elements.