dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
222 stars 157 forks source link

[Feature] Support Data Profiling in dbt #1330

Open syou6162 opened 2 months ago

syou6162 commented 2 months ago

Is this your first time submitting a feature request?

Describe the feature

Dataplex data profiling lets you identify common statistical characteristics of the columns in your BigQuery tables. This information helps you to understand and analyze your data more effectively.

You can set data profiling from the GUI or API, but if you specify materialized='table', the data profiling settings will be deleted because the table will be recreated. If data profiling could be set within dbt after the table is created, it would make it easier for dbt users to use the data profiling function.

Describe alternatives you've considered

No response

Who will this benefit?

People who use BigQuery tables built with dbt. I think this will be a useful feature for data users, especially analysts and business developers, as they can see the statistics for each column without having to write a query.

Are you interested in contributing this feature?

Yes, very much! I'm interested in contributing to dbt, so I plan to send a pull request soon. I think I can do it if I refer to the implementation that supports BigQuery's policy tag.

Anything else?

No response

amychen1776 commented 2 months ago

Thank you for opening up this request! At this time, we will be unable to support this functionality but I'm happy to leave this issue open to collect more feedback (and see the community desire for this).

syou6162 commented 2 weeks ago

@amychen1776 I implemented this feature myself at https://github.com/dbt-labs/dbt-bigquery/pull/1392. Could you ask the development team to review my pull request?

moinuddinmbd commented 1 week ago

@amychen1776 I implemented this feature myself at #1392. Could you ask the development team to review my pull request?

Integrating a data profiling scan is an excellent idea; however, initiating it through a Dataplex scan may not be the most effective approach.