awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

Handle exception for invalid column names containing . #102

Open sidharthbolar opened 2 years ago

sidharthbolar commented 2 years ago

Issue #, if available:

99

Description of changes: Columns with dot in their name are throwing an Analysis Exception Based on analysis done , the exception is being thrown by the core scala deeque library Have proposed to handle this exception to notify the caller that a column name with "." should not be passed the exception handled is added to all check methods and implemented via a static method

Local Tests were passing and have added an additional UT test_invalidColumnException to verify the behaviour

*version = "1.0.1" By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sidharthbolar commented 2 years ago

hi @chenliu0831 thanks for the review In that case this PR handles this exception and notifies the caller to not pass a column with a "." so that he can handle it outside this library

chenliu0831 commented 2 years ago

hi @chenliu0831 thanks for the review In that case this PR handles this exception and notifies the caller to not pass a column with a "." so that he can handle it outside this library

@sidharthbolar right I think I understand the purpose, but column with dot is a valid usage for structured column (JSON type). Do you mind add a test to verify if that works?

Looks like you actually want a better Pythonic error (than the vanilla Spark error) when such column with dot does not exist? If so, how about add a helper to check column existence instead?