Closed vavison closed 5 years ago
Merging #92 into develop will increase coverage by
0.88%
. The diff coverage is81.61%
.
@@ Coverage Diff @@
## develop #92 +/- ##
===========================================
+ Coverage 87.2% 88.08% +0.88%
===========================================
Files 55 74 +19
Lines 1555 1771 +216
Branches 62 79 +17
===========================================
+ Hits 1356 1560 +204
- Misses 199 211 +12
Impacted Files | Coverage Δ | |
---|---|---|
...ala/com/coxautodata/waimak/dataflow/DataFlow.scala | 97.45% <ø> (+0.84%) |
:arrow_up: |
...ata/waimak/dataflow/spark/SparkActionHelpers.scala | 95.83% <ø> (+3.24%) |
:arrow_up: |
...ark/dataquality/DataQualityMetadataExtension.scala | 100% <100%> (ø) |
|
...taquality/deequ/prefabchecks/GenericSQLCheck.scala | 100% <100%> (ø) |
|
...quality/deequ/prefabchecks/CompletenessCheck.scala | 100% <100%> (ø) |
|
...ataquality/DataQualityConfigurationExtension.scala | 100% <100%> (ø) |
|
...taflow/spark/dataquality/deequ/DeequMetadata.scala | 100% <100%> (ø) |
|
...flow/spark/dataquality/ExceptionQualityAlert.scala | 100% <100%> (ø) |
|
...a/waimak/configuration/CaseClassConfigParser.scala | 97.29% <100%> (+0.03%) |
:arrow_up: |
...imak/dataflow/spark/dataquality/DatasetCheck.scala | 100% <100%> (ø) |
|
... and 40 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update ee88f76...382d457. Read the comment docs.
Description
Introduces functionality to monitor and alert on data quality for labels, using Amazon's Deequ library https://github.com/awslabs/deequ
There is a programmatic API which exposes the full functionality available in Deequ (https://github.com/CoxAutomotiveDataSolutions/waimak/wiki/data-quality#deequ)
In addition to the programmatic API, there is also a configuration-based API which exposes some common pre-configured checks (https://github.com/CoxAutomotiveDataSolutions/waimak/wiki/Configuration-Extensions#deequ-extension)
There is also the option to not use Deequ and instead use a custom implementation of data quality checking.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Data quality actions are all thoroughly unit tested.