NCEAS / metadig-checks

MetaDIG suites and checks for data and metadata improvement and guidance.
Apache License 2.0
8 stars 9 forks source link

Arctic theme Quality Reports should incorporate checks that are currently performed manually #99

Open jagoldstein opened 5 years ago

jagoldstein commented 5 years ago

Currently, the arcticdata.io support team performs 29 manual checks of a metadata object, and updates any insufficient fields, before making the object public.

These checks are described in this spreadsheet, which is on the Google Drive. The columns are the EML XPath, a description of the check, and its current status in metadig ("exists", "new", or "modification").

"exists" = the check is already present in the metadig Quality Report and no change is needed (n = 4) "new" = this is a novel check that does not yet exist in the metadig Quality Report (n = 6) "modification" = there is an existing check of this EML element in the metadig Quality Report, but it is in need of a change or update to make the check more specific and/or discriminating (n = 19)

Some of these checks are likely relatively easy to write and implement, while others are more sophisticated and would require the need to read and compare data objects against the EML.

This issue fits in with the proposed "Quality Assessment of Arctic Data Center Metadata and Data" Data Science Fellowship project proposal.

More discussion will be necessary to assign each check to a category (Identification, Discovery, or Interpretation) and to determine how (if at all) fulfillment of each check impacts the overall metadig percentage scores.

tedhabermann commented 5 years ago

A couple of questions: which metadata dialects support descriptions of metadata quality? I know ISO does. who adds reports to the metadata record? should be repository but do they have permission? could this be part of the system metadata in dataOne?

jagoldstein commented 5 years ago

I know that EML supports metadata quality assessment. The reports are not part of the metadata record itself, but are created separately by the metadig engine built by @gothub. It is currently running on arcticdata.io. I do not believe that this could, nor should, be part of system metadata in DataONE.

The checks that are needed go beyond the metadata document itself; they require cross-comparisons between metadata and data objects within a package.