[Visualizations] Define split vs breakdown vs bucket

drewdaemon commented 1 year ago

Right now, "split," "breakdown," and "bucket" are used inconsistently. We could reduce cognitive load by following the principle of DDD and agreeing on a common language. This will become especially helpful as we look at introducing small multiples in Lens which will add yet another similar concept.

Today

"split"
- dividing data and displaying on a single chart (example)
- dividing data and displaying over multiple charts (small multiples) (example
- ...
"breakdown" / "breakdownBy" - dividing data and displaying on single chart
"bucket" - dividing data and displaying on a single chart. Maybe implies the underlying use of an Elasticsearch aggregation? Maybe implies time-series data?

Once we agree on these, we could update the code to match our agreed definitions and enforce this for new code, making things much more understandable. I don't want to ask myself what one of these means when I'm squinting at someone else's logic.

elasticmachine commented 1 year ago

Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations)

markov00 commented 1 year ago

Thank you @andrewctate for starting this. Those terms describe a mix of different operations that I agree can definitely be cleaned up and aligned.

breakdown / breakdownBy: Lens defines one more "bucket" creating a new hierarchical level in your data. Then it describes the dimension with colors.
split: define and compute an additional metric and ask the chart to render it on the same chart. It usually associates a color for each new metric added.
split on small multiples by metrics: define and compute an additional metric and renders it on a different panel.
split on small multiples by dimension: add an additional dimension to your dataset (subdividing it) and render a metric into multiple panel, one for each dimension.
bucket: is really tied to Elasticsearch naming but it refers to two different functions:
- binning is used to divide your data into sets by partitioning a numerical space (field) into ordered bins
- group by is used to divide your data into sets with the same categorical field value. The field could also be a numerical or date field, but the operation is the same: your data is collected in sets relative to the same field value.

Kibana/Lens/TSVB etc mix a bit the data operations with how/where to assign a dimension/metric:

the data subdivision, group by, binning is the data computation aspect
then you have to describe and select how to represent such information in the chart: by color, by shape, by position (hierarchically, stack, etc)

Starting by aligning this on code and moving it to the UI could be a good move. Also elastic/charts should be realigned, because, since inception, we kept and ported most of the preexisting semantics from Kibana and we should finally remove those wrong concepts, promoting a better semantic structure

drewdaemon commented 1 year ago

Thanks for chipping in here @markov00 . Really good analysis.

Starting by aligning this on code and moving it to the UI could be a good move.

IMO, a great place to start would be getting some consensus from all stakeholders (developers, product people, docs team, and designers) with respect to these terms. I've noticed that we often have a "working term" we use as developers. That term gets used all over the code and gets ingrained into our minds. Then we get asked to change the name of whatever it is as part of the product/design review process. Then, one of two things happens

we adopt the new term, leaving the code out of sync
we keep using the working term, leaving the UI/docs out of sync

Edit: though I guess I can think of scenarios where it could make sense to have our own term as developers that doesn't match exactly what is in the UI... for example, even if we agree that that "breakdown by" is what we'll use to describe bucketing a dimension and describing it with colors, there's an argument to continue using the term "slice by" in the pie chart UI.

elastic / kibana

[Visualizations] Define split vs breakdown vs bucket #147790

Today