The pyQuARC tool reads and evaluates metadata records with a focus on the consistency and robustness of the metadata. pyQuARC flags opportunities to improve or add to contextual metadata information in order to help the user connect to relevant data products. pyQuARC also ensures that information common to both the data product and the file-level metadata are consistent and compatible. pyQuARC frees up human evaluators to make more sophisticated assessments such as whether an abstract accurately describes the data and provides the correct contextual information. The base pyQuARC package assesses descriptive metadata used to catalog Earth observation data products and files. As open source software, pyQuARC can be adapted and customized by data providers to allow for quality checks that evolve with their needs, including checking metadata not covered in base package.
So, the master branch was exhibiting an error when validating that the gcmd short and long pair given for an item matched a valid gcmd short and long pair.
You can see an error thrown when running python main.py --format dif10 --fake FAKE
>> DIF/Platform/Instrument/Short_Name:
Error: The provided instrument short name `MODIS` and long name `Moderate-Resolution Imaging Spectroradiometer` aren't consistent.
Please supply the corresponding long name for the short name.
This appears to be a non-existent error, as MODIS is in fact the Moderate-Resolution Imaging Spectroradiometer..
Bugfix
Although this bug appears on master, it does not appear an a recent feature branch subbranch_of_feature/ummc_support.
According to Jenny and Shelby, this branch has some problems and can't be used in it's entirety. I don't know anything about pyQuARC, so I tried to trace back all the relevant code from the working branch and port it over.
However, after moving over all the code that was relevant, a new error appeared.
File "/home/carson/github/pyQuARC/pyQuARC/code/gcmd_validator.py", line 125, in _create_hierarchy_dict
GcmdValidator.merge_dicts(hierarchy_dict, row_dict)
File "/home/carson/github/pyQuARC/pyQuARC/code/gcmd_validator.py", line 200, in merge_dicts
parent[key], _ = GcmdValidator.merge_dicts(parent[key], child[key])
File "/home/carson/github/pyQuARC/pyQuARC/code/gcmd_validator.py", line 199, in merge_dicts
if parent.get(key):
AttributeError: 'str' object has no attribute 'get'
What's happening is that some of the parent values are equal to this_is_the_leaf_node. Here is an output from when you print there values before executing parent.get(key_.
The last thing I did feels hacky, because I can't understand why a parent would be a leaf to begin with. Surely something is wrong somewhere, with some logic or maybe the input csv.
Original Bug
So, the master branch was exhibiting an error when validating that the gcmd short and long pair given for an item matched a valid gcmd short and long pair.
You can see an error thrown when running
python main.py --format dif10 --fake FAKE
This appears to be a non-existent error, as MODIS is in fact the Moderate-Resolution Imaging Spectroradiometer..
Bugfix
Although this bug appears on master, it does not appear an a recent feature branch
subbranch_of_feature/ummc_support
. According to Jenny and Shelby, this branch has some problems and can't be used in it's entirety. I don't know anything about pyQuARC, so I tried to trace back all the relevant code from the working branch and port it over.However, after moving over all the code that was relevant, a new error appeared.
We can trace back this error to the following bit of code: https://github.com/NASA-IMPACT/pyQuARC/blob/d3995b025ffe106169af2e91eb3de7ba8e3e0fda/pyQuARC/code/gcmd_validator.py#L178-L193
What's happening is that some of the
parent
values are equal tothis_is_the_leaf_node
. Here is an output from when you print there values before executingparent.get(key_
.In this pull request, I have circumvented this error with some questionable code that might not be good. https://github.com/NASA-IMPACT/pyQuARC/blob/d3995b025ffe106169af2e91eb3de7ba8e3e0fda/pyQuARC/code/gcmd_validator.py#L185-L186 Basically, I'm just running a check to see if the parent is a leaf and returning the values directly.
Concern
The last thing I did feels hacky, because I can't understand why a parent would be a leaf to begin with. Surely something is wrong somewhere, with some logic or maybe the input csv.
GNSS RECEIVER
andSOUNDER DETECTOR
both throw this error. I went into the csv file and discovered that I could make them disappear by updating the csv in certain places. https://github.com/NASA-IMPACT/pyQuARC/blob/d3995b025ffe106169af2e91eb3de7ba8e3e0fda/pyQuARC/schemas/instruments.csv#L508-L515 If you replace the empty quotes in line 507 with the long nameSounder Detector 1
, this value is no longer flagged as a faulty leaf.