NASA-IMPACT / pyQuARC

The pyQuARC tool reads and evaluates metadata records with a focus on the consistency and robustness of the metadata. pyQuARC flags opportunities to improve or add to contextual metadata information in order to help the user connect to relevant data products. pyQuARC also ensures that information common to both the data product and the file-level metadata are consistent and compatible. pyQuARC frees up human evaluators to make more sophisticated assessments such as whether an abstract accurately describes the data and provides the correct contextual information. The base pyQuARC package assesses descriptive metadata used to catalog Earth observation data products and files. As open source software, pyQuARC can be adapted and customized by data providers to allow for quality checks that evolve with their needs, including checking metadata not covered in base package.
Apache License 2.0
19 stars 0 forks source link

Use CMR For Validation Type Checks #269

Open tbs1979 opened 11 months ago

tbs1979 commented 11 months ago

The CMR Team would like to recommend that pyQuARC use the CMR API for basic validation checks (metadata schema, controlled vocabularies) rather than have a separate validation in pyQuARC. If there are changes to CMR validation, we want to ensure that users are not using outdated validation in pyQuARC. We also want to verify consistency between what results are provided to users in pyQuARC vs CMR.

Collection metadata can be validated at ingest or without having to ingest it. The validation performed is schema validation, UMM validation, and inventory specific validations. Keyword validation can be enabled with the keyword validation header. It returns status code 200 with a list of any warnings on successful validation, status code 400 with a list of validation errors on failed validation. Warnings would be returned if the ingested record passes native XML schema validation, but not UMM-C validation.

The validation API is documented at https://cmr.earthdata.nasa.gov/ingest/site/docs/ingest/api.html#collection.

tbs1979 commented 8 months ago

https://cmr.earthdata.nasa.gov/ingest/site/docs/ingest/api.html#collection documents the CMR validation API. The wiki page UMM-C Validation in CMR lists the current collection validations done in CMR. The type of validation, whether its an error or warning, and message is documented. This is also documented in CMR Clojure code in more extensive detail.