Open pawelpinkos opened 1 month ago
@rdsharma26 - could you please take a look at this? Probably your change is root cause of this. Thanks a lot!
Thanks @pawelpinkos for bringing this to our attention. The details are extremely helpful. We will investigate this and get back to you soon.
Describe the bug Beginning with version 2.0.7 of Deequ (all spark releases) there is a bug in library witch happend failing of catalyst codegen in spark. The exception is catched so this do not fail runtime, you can observe the issue in the logs (eg. try to run MaximumTest from Deequ tests and see the log).
I have investigated and in my opinion the root cause of issue is the change: https://github.com/awslabs/deequ/commit/34d8f3ae70df5a049129f423e2d296ea81a6a1b8
Error is throw when AnalisisRunner call dataframe.agg() here depending of provided parameters. Eg. before deequ 2.0.7 (for the example provided in "To Reproduce" section) the parameteres were:
And there was no error. For deequ 2.0.7 the parameters are:
And the error is thrown.
This is cause of a lot of errors in logs of application witch use Deequ. I have tried to bump deequ in my project to 2.0.7 but beacuse of this I have to postpone this action.
To Reproduce Create project with Deequ 2.0.7 dependecy and run below code:
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Add any other context about the problem here.