calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.06k stars 130 forks source link

Feature/request - adding support for nested field records (Bigquery) #143

Open vieiraAlexandre opened 2 years ago

vieiraAlexandre commented 2 years ago

The suite of data tests provided by dbt-expectations is quite good - thanks for all the work!

Would there be any planned work in supporting data checks for nested field records ?

As in, if we'd like to check the data values (applying dbt-expectations macros) for a given nested record field value for a BigQuery table, however that currently does not seem to be possible due to the need for the added UNNEST parameter in the compiled SQL code.

Would that be a feasible enhancement or would there be any more complex behaviour behind the scenes making this difficult to happen?

Thanks!

clausherther commented 2 years ago

Hi @vieiraAlexandre, thanks for bringing this up! I've not thought about this much, but it would seem this would require adding an unnested path to the Bigquery dispatch of each of the macros? Do you have an example of how that would look compiled for one of the tests you're interested in?

vieiraAlexandre commented 2 years ago

Hi @clausherther - I reckon there could be a validation step in the macro(s) to check whether the input column comes from a standard/raw or from a nested record type - possibly by checking INFORMATION_SCHEMA provided by BigQuery/GCP first?

From this point then there would be a query for the standard case (as it exists now) and for the nested record type by adding in the BigQuery's UNNEST command in the SQL code within the macro itself - as by unnesting the column the counts can be different and the logic required for the validating the test assertions could be different.

I understand it would perhaps add in complexity if it's made to scale up to the maximum of 16 nested layers BigQuery supports, though if this feature existed for 2 layers (e.g., continent.country or person.name) it would already be fantastic and add in a lot of value for the package!

Aaron-Webb-AT commented 1 year ago

Hey @vieiraAlexandre - I've come across this problem in one of my projects too. Wondered if you've made any progress on an implementation before I start to take a look?

Thanks