Raiffeisen-DGTL / checkita-data-quality

Fast data quality framework for modern data infrastructure
GNU Lesser General Public License v3.0
24 stars 5 forks source link

Checkita Data Quality

Latest Version: 2.1.0

Checkita is a data quality framework written is Scala 2 which uses Spark 3.2+ as a computation core. This framework is used to perform parallel and distributed quality checks on big data environments. Its functionality allows calculating various metrics over the large datasets and perform multiple checks to ensure quality of these datasets.

For more information, please see the Documentation.

Project Build

Project is built using SBT. There are several JVM variables available to make builds more flexible:

Thus, depending on environment, it is possible to build:

Releases to Maven Central Repository

Starting from Checkita 2.0 the project releases are published to Maven Central repository. As it was already stated, Checkita supports multiple Spark versions starting from 3.2.0 and up to 3.5.1 (currently). The code base of Checkita does not change between supported version of Spark. However, the Spark transitive dependencies DO change for various version. In order to keep reasonable number of packages being released to Maven Central repository per each Checkita version, we only publish Checkita packages tied to minor Spark version like 3.2.0 or 3.3.0 and skip patch Spark versions. Thus, Checkita packages have following versioning scheme: <checkita-version>-<spark-minor-version>. For example: checkita-core_2.12-2.0.0-3.2 is build for Scala 2.12 and Spark 3.2.0.

We also provide FULL lists of dependencies for each of supported Spark versions in raw text format. You can fund these lists by links in the table below:

Spark \ Scala 2.12 2.13
3.2.0 view view
3.2.1 view view
3.2.2 view view
3.2.3 view view
3.2.4 view view
3.3.0 view view
3.3.1 view view
3.3.2 view view
3.3.3 view view
3.3.4 view view
3.4.0 view view
3.4.1 view view
3.4.2 view view
3.4.3 view view
3.5.0 view view
3.5.1 view view

Contribution

Thank you for considering contributing to our project! We welcome contributions from everyone. By participating in this project, you agree to abide by our Code of Conduct.

Please take a moment to review our Contribution guide in order to make the contribution process as smooth as possible.

License

Checkita Data Quality is GNU LGPL licensed.

This project is a reimagination of Data Quality Framework developed by Agile Lab, Italy.