NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link
assessment datamanagement datascience java metadata

metadig-engine

The Java project for MetaDig

The Metadata Quality Engine is a flexible engine for managing and executing metadata quality checks on a wide variety of metadata documents. Quality Checks are organized into Suites that can be stored and managed by the engine before ultimately being run against input metadata documents. QC run results are provided and can be stored by the engine for later retrieval and analysis, or consumed immediately and discarded. We have aimed to support multiple languages for writing checks so that different communities can use the code and libraries with which they are most familiar.

Read on to see how you can get started with the Metadata Quality Engine.

Creating a check

For technical details about the Suite, Check and Run models, Java model and the XSD schema provide documentation. Below is a brief description of the common check elements to get started writing QC suites.

Script environment state

The script environment between successive checks can be maintained (variables that have been initialized by selectors or by the check code itself) Setting the inheritState flag to true allows the previous check's variables to be accessed (and any previous checks before it as long as each check has opted to inherit state). Care should be taken that variable names do not collide from one check to another, otherwise the results can be confusing.

Where to put checks

A separate repository has be created for the suites and checks used by the Quality Engine, at https://github.com/NCEAS/metadig-checks. If you wish to add a check or suite, please visit this repository and create a Git Pull Request containing any additions.

Running suites

It is recommended to develop checks within an editor supporting the check language being used, before placing the code in a check or suite XML document. You can also test the code by setting variables as you would expect them to be initialized by the selectors in your check. This will be a quicker way to identify interpretation and runtime issues before testing with the Quality Engine.

The R package https://github.com/NCEAS/metadig-r can be used to assist in authoring checks.

Once you are satisfied with your check code, you can run a suite (one or more checks in a suite) using the command line:

mvn exec:java -Dexec.mainClass="edu.ucsb.nceas.mdqengine.MDQEngine" -Dexec.args="<path-to-suite.xml> <path-to-metadata.xml> <path-to-DataONE-systemmetadata document>"

Using the library elsewhere

In order to use Metadata Quality Engine features in other projects, the Quality Engine distribution can be included in other Maven projects like any other dependency and all public classes and methods will be readily available to that project.