FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

E-Mail reporter? #123

Closed FRosner closed 7 years ago

FRosner commented 7 years ago

Problem

It would be great to get reports also via mail without having to setup Grafana or Kibana alerts and piping things through.

Solution

Implement an EmailReporter based on https://github.com/softprops/courier. We can test it using https://nilhcem.github.io/FakeSMTP/.

Documentation

https://github.com/FRosner/drunken-data-quality/wiki/Drunken-Data-Quality-4.1.0#email-reporter

Example

import de.frosner.ddq.core._
import de.frosner.ddq.reporters._

case class Customer(id: Int, name: String)
case class Contract(id: Int, customerId: Int, duration: Int)

val customers = spark.createDataFrame(List(
  Customer(0, "Frank"),
  Customer(1, "Alex"),
  Customer(2, "Slavo")
))

val contracts = spark.createDataFrame(List(
  Contract(0, 0, 5),
  Contract(1, 0, 10),
  Contract(0, 1, 6)
))

val check1 = Check(customers)
  .hasNumRows(_ >= 3)
  .hasUniqueKey("id")

val check2 = Check(contracts)
  .hasNumRows(_ > 0)
  .hasUniqueKey("id", "customerId")
  .satisfies("duration > 0")
  .hasForeignKey(customers, "customerId" -> "id")
val reporter1 = EmailReporter(
  smtpServer = "localhost",
  to = Set("CDO@yourcompany.com"),
  smtpPort = 23456,
  accumulatedReport = true
)

val reporter2 = EmailReporter(
  smtpServer = "localhost",
  to = Set("CDO@yourcompany.com"),
  smtpPort = 23456,
  accumulatedReport = false,
  usernameAndPassword = Some(("user", "password"))
)

Runner.run(Seq(check1, check2), Seq(reporter1, reporter2))
reporter1.sendAccumulatedReport(Some("Data Warehouse"))
FRosner commented 7 years ago

image

It doesn't get rendered as HTML :(

FRosner commented 7 years ago

Now it works. We had to set the content type correctly