capitalone / DataProfiler

What's in your data? Extract schema, statistics and entities from datasets
https://capitalone.github.io/DataProfiler
Apache License 2.0
1.43k stars 162 forks source link

Meaning of quantile names #414

Open ian-contiamo opened 3 years ago

ian-contiamo commented 3 years ago

When profiling a numerical column, I get these quantiles:

        "quantiles": {
          "0": 657.04,
          "1": 999.18,
          "2": 1335.71
        },

Based on the data, I guess these are the 25, 50, and 75 percentiles. I feel like 0, 1, and 2 is not meaningful to the human reader, and also not very extendable to other quantiles in the future.

Shouldn't this field be called quartiles (with an r)?

PS. Thanks for the quick feedback to my other questions!

JGSweets commented 3 years ago

@ian-contiamo I do agree that the keys may be a problem for human interpretability. @scottiegarcia may have some additional thoughts. However, I do believe quantiles is correct as we allow for the number of quantiles to be different than 4 (default).

I think in this case, maybe "0" -> "1" first quantile of 4. However, maybe there needs to be info on count of quantiles. My only concern with specifying the percentile is that it is open-ended since n is configurable.