awslabs / python-deequ

Python API for Deequ
Apache License 2.0
676 stars 131 forks source link

Retrieving indication of failed metrics #109

Open krexez opened 1 year ago

krexez commented 1 year ago

Is your feature request related to a problem? Please describe. I have a process that computes data metrics using deequ. I noticed that when a spark job computing these metrics fails, they are stored as failed metrics in the AnalyzerContext Scala object. However, the only API I seem to find for retrieving the result metrics is successMetricsAsDataFrameor successMetricsAsJson which both call the Scala APIs that filter out the failed metrics. This means that I have no simple way of finding out if metrics failed due to Spark job failure in order to rerun it or investigate the reasons for the failure.

Describe the solution you'd like Ideally a translation of the metric map in AnalyzerContext into a python object. This would also solve the failed metric problem and also I won't have to manually parse the results from the dataframe in order to save them in the format I would like to. Another possibility would be to add a allMetrics getter to the API.

Additional context The python AnalyzerContext object

The scala AnalyzerContext object