MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

generic mapper: catch ALL attributes #167

Closed ghukill closed 6 years ago

ghukill commented 6 years ago

After improving the static XML harvester, and different kinds of XML records through, it revealed a flaw with the generic mapper. It should include element attributes through all node hops, not just the last one.

For example, the following MARC XML:

<marc:datafield tag="040" ind1=" " ind2=" ">
    <marc:subfield code="a">DLC</marc:subfield>
    <marc:subfield code="c">DLC</marc:subfield>
</marc:datafield>

Results in the following fields:

record_datafield_subfield_@code_a
record_datafield_subfield_@code_c

These lose the defining tag numbers becuase they are attributes before the terminating node, where currently only attributes are gathered from. If we collected them all the way down, the ES field names would be longer, but more meaningful. They would have been:

record_datafield_@tag_040_subfield_@code_a
record_datafield_@tag_040_subfield_@code_c

However, it should probably not include the attributes form the root node, which often contains multiple declarations.

ghukill commented 6 years ago

And, should be careful that it strips empty attributes, which these records contain.

ghukill commented 6 years ago

Done.