NASA-IMPACT / pyQuARC

The pyQuARC tool reads and evaluates metadata records with a focus on the consistency and robustness of the metadata. pyQuARC flags opportunities to improve or add to contextual metadata information in order to help the user connect to relevant data products. pyQuARC also ensures that information common to both the data product and the file-level metadata are consistent and compatible. pyQuARC frees up human evaluators to make more sophisticated assessments such as whether an abstract accurately describes the data and provides the correct contextual information. The base pyQuARC package assesses descriptive metadata used to catalog Earth observation data products and files. As open source software, pyQuARC can be adapted and customized by data providers to allow for quality checks that evolve with their needs, including checking metadata not covered in base package.
Apache License 2.0
19 stars 0 forks source link

QuARC/pyQuARC Inconsistencies #287

Open smk0033 opened 5 months ago

smk0033 commented 5 months ago

After being informed that QuARC now pulls the most recent version of pyQuARC, some testing was done to make sure outputs were the same. Upon testing, it was noticed that there were still some inconsistencies, which have been documented here.

Fields where issues were noticed:

Along with this, I also noticed general QuARC errors at the bottom of the resulting json and how certain checks failed (I remember URLs being one of them and the Beginning/Ending Date Times).

Please double-check behind me to ensure that my results weren't potentially an error on my end.

xhagrg commented 5 months ago

@smk0033 would you be able to update the spreadsheet with proper responses to these field checks?

smk0033 commented 5 months ago

Sure! I've updated it - I think ideally QuARC is just supposed to match PyQuARC's outputs. For the first record on the list, C1576365803-LARC_ASDC, I think QuARC is technically correct for not flagging those date times, but I only highlighted them since it still didn't flag them when PyQuARC did. That may be fine then, especially since that should be fixed in PyQuARC soon!

For the other fields and records, I went ahead and put the expected output. Thanks!

smk0033 commented 5 months ago

Additionally, for record C2098746562-LARC, this was PyQuARC's output:

    >> DIF/Related_URL/URL:
            Error: A URL with a status code other than 200 has been identified: `[{'url': 'https://www.nasa.gov/feature/langley/longstanding-carbon-monoxide-measuring-instrument-mopitt-celebrated/', 'error': 'Status code 404'}]`.    
            Error: A URL with a status code other than 200 has been identified: `[{'url': 'https://www.pnas.org/content/115/20/5099', 'error': 'Status code 403'}]`.
            This often indicates a broken link. If the URL is broken, recommend revising.

It seems that the first link is broken, but the second one isn't. It seems the same for C2065183177-LARC_ASDC. PyQuARC output:

    >> RelatedUrls/URL:
            Error: A URL with a status code other than 200 has been identified: `[{'url': 'https://earthobservatory.nasa.gov/images/145494/sampling-the-castle-fire/', 'error': 'Some unknown error occurred.'}]`.
            Error: A URL with a status code other than 200 has been identified: `[{'url': 'https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2022JD037758', 'error': 'Status code 403'}]`.
            This often indicates a broken link. If the URL is broken, recommend revising.

Both links do seem to work. Is this something that needs to go in its own ticket if there already isn't one opened?

xhagrg commented 5 months ago

I checked, and there are some changes that have not made it's way to public release of pyQuARC. Working through the remaining PRs. Once reviewed, verified, we can merge to master branch and create a new version. This will then be pulled into quarc. We can validate then the results again then

smk0033 commented 5 months ago

Sounds great, thank you! Would you like me to go ahead and close this issue until the release and new testing?

xhagrg commented 5 months ago

Let's keep this open until we have verified all the changes are also available in quarc