[FEATURE] Enhance QueryDQ feature to capture the source and target values

Is your feature request related to a problem? Please describe. The query DQ feature does provide output in terms of boolean vales. The boolean value of FALSE does inform us that there is something wrong in the query validation results but user won't knew what went wrong. They have to rerun the query manually to figure out the difference. If it is production support team, it is highly unlikely that they are aware of the validation scripts. It becomes tough to get actionable insights out of query DQ feature. If the query results of both the source and target is fetched and stored in a custom stats table, it would be useful for users to build actionable insights or work items based on the results

Describe the solution you'd like Right now, it is programmed to pass one query to the QueryDQ. Instead, we can pass three queries as below.

select X from table1; select Y from table2; select x=y from t1 join t2 Queries are separated by semi colons. If it is one query, it is the default behaviour and for three, the behaviour is as below.

X and Y are values to be compared between source and target respectively. Third query is the validation query. If the validation is FALSE, then we can fetch the X and Y values and store it as JSON in a custom stats table. The custom table is user managed and should be passed as the argument as below.

SparkExpectations(custom_dq_info_table = ""...)

Describe alternatives you've considered We are right now implementing the above option as a separate module and use it along with other features of SparkExpectation

Additional context The custom table is user managed. Permissions and other stuffs have to be handled by the user. Number of rows could be restricted to 200 initially for the records to be stored in the custom stats table.

Nike-Inc / spark-expectations

[FEATURE] Enhance QueryDQ feature to capture the source and target values #55