data-mie / dbt-profiler

Macros for generating dbt model data profiles
Apache License 2.0
81 stars 33 forks source link

Question: add TOP x column values and distribution. #65

Open diegodewilde opened 1 year ago

diegodewilde commented 1 year ago

Hi,

I was looking at this project and I must say: it's awesome and something that dbt docs currently is missing.

One thing got in my mind is the question why there's not an option to add the TOP x column values and their distribution? Is there any other reason to not include this in the docs?

Like in this example where you show TOP 2 for example:

Column Name Top 1 Value Distribution Top 2 Value Distribution
Column 1 Value 1 A 0.50 ("number"/"total") Value 1 B 0.20 ("number"/"total")
Column 2 Value 2 A 0.50 ("number"/"total") Value 2 B 0.30 ("number"/"total")
Column 3 Value 3 A 0.10 ("number"/"total") Value 3 B 0.05 ("number"/"total")
Column 4 Value 4 A 0.10 ("number"/"total" Value 4 B 0.05 ("number"/"total")

Looking forward to your thoughts!

stumelius commented 1 year ago

@diegodewilde I've thought about adding a "mode" (most common value) profiling metric to the package but never around to implementing it. This proposal expands the mode concept into N most common values and I think it's a good idea.

Just throwing thoughts here:

Would you be interested in implementing this? :)

diegodewilde commented 1 year ago

Hi stumelius,

stumelius commented 1 year ago

@diegodewilde Circling back to this. Is this feature still in your interests and if so, would you like to contribute? :)