awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.18k stars 519 forks source link

add data synchronization test to verification Suite. #526

Closed VenkataKarthikP closed 6 months ago

VenkataKarthikP commented 6 months ago

*Issue, if available: #501

Description of changes: Adding data synchronization check to verification suite, with this change users can define isDataSynchronized check.

Example usage -


val verificationResult = VerificationSuite()
  .onData(data)
  .addCheck(Check(CheckLevel.Error, "must have data in sync")
                                      .isDataSynchronized(dfToCompare, Map("id" -> "id"), _ > 0.7)
  .run()

cc: @mentekid @rdsharma26 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

VenkataKarthikP commented 6 months ago

@rdsharma26 thanks for the review, updated with review comments.