MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

xml2kvp: capture attributes as values? #209

Closed ghukill closed 6 years ago

ghukill commented 6 years ago

Currently, nodes or "hops" aren't written to kvp_dict until an XML node value is encountered.

But imagine the following XML:

<root>
    <foo age="42"/>
    <bar>Willy Wonka</bar>
    <baz age="44">
        <nested_bar>Sally</nested_bar>
    </baz>
</root>

How might the age 42 get captured? What if a list of attributes could be captured as terminating values? How would this effect downstream values, like the value for <nested_bar>?

ghukill commented 6 years ago

Even if including attribute age to capture that information in field names, it would not capture those without values for that element:

In [7]: xml = '''<root>
   ...:     <foo age="42"/>
   ...:     <bar>Willy Wonka</bar>
   ...:     <baz age="44">
   ...:         <nested_bar>Sally</nested_bar>
   ...:     </baz>
   ...: </root>'''

In [9]: XML2kvp.xml_to_kvp(xml, include_attributes=['age'])
Out[9]: {'root_bar': 'Willy Wonka', 'root_baz_@age=44_nested_bar': 'Sally'}

Likely worth exploring how to convert an attribute to field, and capture values.

ghukill commented 6 years ago

Done.