NASA-IMPACT / pyQuARC

The pyQuARC tool reads and evaluates metadata records with a focus on the consistency and robustness of the metadata. pyQuARC flags opportunities to improve or add to contextual metadata information in order to help the user connect to relevant data products. pyQuARC also ensures that information common to both the data product and the file-level metadata are consistent and compatible. pyQuARC frees up human evaluators to make more sophisticated assessments such as whether an abstract accurately describes the data and provides the correct contextual information. The base pyQuARC package assesses descriptive metadata used to catalog Earth observation data products and files. As open source software, pyQuARC can be adapted and customized by data providers to allow for quality checks that evolve with their needs, including checking metadata not covered in base package.
Apache License 2.0
19 stars 0 forks source link

Add a xml postprocessor to resolve values with attrs #226

Closed slesaad closed 1 year ago

slesaad commented 1 year ago

Sometimes the XML values contain uuid attributes.

Example:

<ISO_Topic_Category uuid="26ebb539-cae2-4961-9252-7f367642fa57">IMAGERY/BASE MAPS/EARTH COVER</ISO_Topic_Category>

In such a case, the returned value for a field looks something like:

>> doc["DIF"]["ISO_Topic_Category"]
    OrderedDict([('@uuid', '26ebb539-cae2-4961-9252-7f367642fa57'), ('#text', 'IMAGERY/BASE MAPS/EARTH COVER')]) 

instead of the regular:

>> doc["DIF"]["ISO_Topic_Category"]
    IMAGERY/BASE MAPS/EARTH COVER

xmltodict.parse() takes in a postprocessor argument that takes in a method that can modify the returned key and/or value. This was used to solve the issue.

xhagrg commented 1 year ago

LGTM.