FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

HTML Reporter/Databricks reporter #146

Closed eyaltrabelsi closed 3 years ago

eyaltrabelsi commented 6 years ago

Hi I am using databricks and your package is very useful.

I managed to make zepplin reporter work on databricks using something like:

import java.io._ def Databrick var htmlOutput = new ByteArrayOutputStream(); val zepplinReporter: Reporter = ZeppelinReporter(new PrintStream(htmlOutput))

Runner.run(Seq(check), Seq(zepplinReporter)) val html_content = htmlOutput.toString.stripPrefix("%html") displayHTML(html_content)

I think it can be beneficial to other people and i and i am wondering whether you prefer the pull request as documentation or reporter like this one

FRosner commented 6 years ago

Thanks @eyaltrabelsi!

I see that you basically just stripped %html and then used the HTML output. For Zeppelin there was a trick necessary when printing multiple check results in the same cell. Can you try adding another check to the sequence and see if the output is still good?

Where does the displayHTML function come from? Is it Databricks built-in?

eyaltrabelsi commented 6 years ago

@FRosner

FRosner commented 6 years ago

Thanks @eyaltrabelsi,

Just to clarify. In which namespace does displayHTML live? Is it imported automatically?

Also I am asking about the multiple checks because the HTML looks different for the second check. So you are telling me that when running multiple checks in the same cell, the output still looks good? Just out of curiosity, can you attach a screenshot of the checks from the README run with your Databricks reporter?

I'd love to have it as a separate class, especially as the Zeppelin reporter might change if Zeppelin changes its way of rendering. I wouldn't like to have a dependency there. What we can do to reduce code duplication is to factor out the method that generates the HTML code and have the Zeppelin and Databricks reporters only put the required surrounding code.

Let me know what you think.