Add DQ_DataQuality - Githubissues

For data quality, we need kind of a workaround. We will not receive most of the information needed to fully describe the data lineage. Both, the data lineage as well as a record of data quality measures are required to satisfy ISO 19115. In metacatalog, we would have to implement a set of new tables. Both, lineage and data quality reports have a 1:m relationship to Entry.

Lineage

For datasets we need a free-text statement about data origin and how it was processed.
it can have a source (basically another metadata entry that was used to process the datasource)
it can have all processing steps attached, that were undertaken to process a dataset. These steps are normalized, have a code list of possible values and are attached to a source on their own

I can't see how we can implement source and processing steps into lineage here.

Report

Reports are basically chronological combinations of a registered certified process identificator combined with a free-text description and a list of possible outcomes. As we design metacatalog to use of quality checked data, we would implement a granular, highly specialized scheme to store information that we don't want to have in most cases or we won't get from data holders. At first glance, there are more than 20 tables necessary to describe possible outcomes.

The only possibility I see here is to define some (like 3.) quality measure outcomes that are available in metacatalog and map them to ISO 19115 on export. We still have the issue, that each of the implemented quality measure results needs a citeable authority that standardized this particular outcome in the first place. ISO requires a citation of this authority to identify data quality measure outcomes. So if we come up with our own stuff here, we need to publish a controlled CodeList, I guess.

I am not sure how to handle this in metacatalog and ideas are highly appreciated. @sihassl @MarcusStrobl At the end of the day, at least 1 record has to be in report and lineage. The more complicated question will be how to handle that information on import, if we can't map it.

VForWaTer / metacatalog

Add DQ_DataQuality #79

Lineage

Report