[Lens] Improve date formatting by using a context-aware formatter like D3

wylieconlon commented 4 years ago

The Kibana setting dateFormat:scaled is used in Visualize, TSVB, Timelion, Infra, and SIEM to provide standard date formatting for visualizations based on the interval between dates. For example, the default formatter will show the day, but not hour, when the interval is 24 hours-1 year, which is set as: ["P1DT", "YYYY-MM-DD"],. Lens currently uses the interval formatter that is provided by elastic-charts which uses a similar approach of selecting a formatter based on the interval.

These date format options are not as powerful or user-friendly as what D3 offers in its time scale options. There are many examples of more dynamic scales that we could be implementing in Lens or across visualizations in general. The biggest difference is that Kibana and elastic-charts are formatting individual labels, while D3 scales will output different labels based on context.

A concrete example is a scale that shows the previous 7 days of data at 3 hour intervals. In Kibana or elastic-charts the formatter for this is always YYYY-MM-DD HH:mm or MM-DD HH:mm, so every labels contains the same information like:

2019-11-14 05:00 2019-11-14 08:00 2019-11-14 11:00

This contains duplicate information, but is high precision. D3 removes duplicate labeling, so it is smart enough to output the day once per day, and the hour between each day. This would be shown as:

Thu 14 05:00 08:00 11:00 ... Fri 15 05:00

The D3 formatter also removes other duplication such as on the 1st of the month, showing the month name, or on the 1st of January, showing just the new year.

The D3 formatter approach makes more sense for visualizing time series, as it's able to provide clearer labels without losing precision. However, the D3 approach is more complicated to configure.

All time formatting suffers from issues with time zones and DST, and these existing formatters are no exception. We already provide some settings for time zone detection and use in the Kibana advanced settings and they need to be respected. DST settings are not currently configured by users, but needs to be taken into consideration for determining intervals.

Recommendation:

The formatting approach described above would improve the readability of time series charts. There are several potential ways to integrate this proposal:

We could improve the default date formatter for all Kibana visualizations, and then integrate that formatter into Lens. This would increase consistency for all users and provide a clear way to configure the setting.
Lens already uses a date formatter that ignores the advanced settings, and could implement a better formatter that continues to ignore these settings. It could be done so that users don't have control over their formatter.
This formatter could be integrated into elastic-charts by default, and then all new visualizations will be able to set it. This is not ideal because visualization authors still need to provide a formatting function for other data types, and because of localization issues.

In terms of implementation, there is no need to import the scale logic from D3. We already have logic to handle time intervals, but the logic is too simplistic.

elasticmachine commented 4 years ago

Pinging @elastic/kibana-app (Team:KibanaApp)

flash1293 commented 4 years ago

@wylieconlon

Lens currently uses the interval formatter that is provided by elastic-charts which uses a similar approach of selecting a formatter based on the interval.

Lens already uses a date formatter that ignores the advanced settings

Is this the case? I remember I implemented formatHints as column meta data on the data table that passes the "smart" date format leveraging dateFormat:scaled to the renderer which also picks it up to format the ticks. So Lens should have feature parity with the other visualization types. Or maybe I'm misunderstanding.

https://github.com/elastic/kibana/blob/master/x-pack/legacy/plugins/lens/public/xy_visualization_plugin/xy_expression.tsx#L170

wylieconlon commented 4 years ago

I think you're right that we are using the dateFormat:scaled format in the primary chart, while we are using the elastic-charts formatter in the preview popover for dates. This might make a good case for implementing this improved formatter for all visualizations.

wylieconlon commented 4 years ago

markov00 commented 4 years ago

Thanks @wylieconlon for this. I agree with you that we need to align the current time formatter implementation across kibana. I personally found the way D3 handle tick labels a bit distracting and confusing under specific configuration. My main concern is that, removing the redundancy of text in labels moving part of that information into a single label, force the user to move his eyes back and forth to find the relative information for a single tick

In this example, if I'm looking at something near the right edge of the chart, understand the day I need to visually scroll the axis to the left until I found the Fri 22 label to understand what day I was looking at. This can increase a bit the cognitive complexity when reading the chart. Other examples are:

if I'm on the first half of the chart and I need to understand what day is represented: there is no label for that day, I need to go and find the next label Fri 22 and mentally go back 1 day to Thu 21.
on the same 2 day chart, I'm seeing only the label of the first of the month, what is the previous day? 30, 31, 28 or 29? I've go remember the selected time range, remember the previous month, remember how many days that month has. Or I need to move my mouse and get that information somewhere else.

It's not just a matter about formatting, but also about how to display these ticks. I'm ok with d3-like ticks if we can highlight and clearly distinguish between the "major" labels and the "minors" similarly to these examples:

I think, as a general rule: we can provide the user with options: use redundant labelling or simplified/d3 like. I think it will be a win-win if we provide a nice default + having the possibility to change that configuration to a different one.

This formatter could be integrated into elastic-charts by default, and then all new visualizations will be able to set it. This is not ideal because visualization authors still need to provide a formatting function for other data types, and because of localization issues. localization of strings in elastic-charts is in our long term roadmap (we just have few hardcoded english strings in our library but we'd like to add localization to them)

I'd also like to know the point of view of @VijayDoshi and @monfera on this

monfera commented 4 years ago

I'm also in favor of providing a clear hierarchy of date/time landmarks; if there's not much space, it can be done with using progressively larger, heavier, capitalized and/or different font for coarser rasterizations (eg. the year in a month/year relation) or if there's more space, offsetting as above, or even a multilevel axis (2-3 levels, depending). There can be auxiliary means eg. gridlines (or more prominent ones), or even some subtle stripes for freeform Canvas use, though that removes a bit from the data ink color space.

After all, calendars have to do a good job showing time, and the user's need for understanding where we are on the temporal axis isn't fundamentally different from the same need in a calendar, with the main constraint being the high aspect ratio, as most space is needed for the chart. (Btw. there are chart varieties where the date/time part is central, and information is presented inside the axis.)

Here's an incomplete (no ticks/gridlines!) example partly for a multilayer temporal axis that tries to show time in a way that's reasonably detailed for any given zoom level, without occlusion, so the layers come and go based on the zoom level. It's an overdone extreme case but we humans already use many levels of detail for time/date, and calendars also use strategic overpaint where eg. the larger unit is big but subdued, not pitching it though 😃

multilevel-axis-ticks

Tableau's top + bottom axes can be interesting too; this example is with disparate dimensions though:

Marco mentioned about plans for hierarchical axes too where the X axis on the bottom can represent two different ordinal/categorical dimensions (or maybe a very low number of ordinal/categorical dimensions and zero or one quantitative dimension).

For the time we have zoom & pan, recapitulating the largest units at the start of the axis is good practice too, as many people look at the left first, ie. as date/time labels slide out on the left when panning, the larger units should "stick". It's true even if the chart is static: ideally the chart starts with a tick label of "2019 March" rather than just "March".

wylieconlon commented 4 years ago

Which parts of this need to be implemented in our chart library vs as a user preference?

monfera commented 4 years ago

Not sure if I understand the question and @markov00 may have a better answer for elastic-charts, but it's useful to view axes as just data visualization, ie. a layer that is cast on the same projection space as the centerpiece data ink eg. bars or lines.

Specifically, an axis is an aggregation of the domain of the function(s) corresponding to, here, a temporal domain. The aggregation of the domain is the minimum and maximum value, which can be represented as eg. a horizontal line with marked start and end values (similar to how, on a map, the scale is shown: a short section showing what corresponds to 10 miles).

As with map legends, Cartesian data visualizations can also have evenly spaced ticks and can adhere to constraints. There's no limit to how many layers can be done as it's just a layer that's "below" the inner chart area. It's a richly annotated data visualization of the min/max aggregation of the domain.

So, as an axis is merely an annotated visualization layer (mostly consisting of annotations eg. ticks+labels) on a data visualization projection space and as such I think it belongs to the chart, and also needs to be heavily driven by user preference. Apologies if I misunderstood the question

wylieconlon commented 4 years ago

We could implement a date formatter in Kibana that applies the rules above. Is this worth doing on its own?
What will elastic-charts implement in terms of hierarchical labels?

monfera commented 4 years ago

tl;dr good questions,

I think mappings should live in one place but accessible everywhere,
hopefully elastic-charts would implement something that's usable across the board but it needs a bit of design and decision maker approval

It's not just formatters and labels; also colors etc. as these should work across the board. Final home may or may not need to be elastic-charts

We should be able to have dashboards, Canvas workbooks etc. such that the mappings can be uniform, irrespective of the charting tool. Examples: color palettes, value formatters, the handling of common dimensions eg. time and its alternative levels of detail eg. weekly/monthly/quarterly discretizations with multilevel axis ticks. If a user chooses a categorical palette, or hand-picks colors for their dashboard, all charts may need to adhere. Potentially, even across multiple dashboards and/or canvas workbooks; Kibana vs Vega; Cartesian visualizations, non-Cartesians, user plugins and maps.

A Tableau dashboard with charts of (apparently) shared color mapping:

For this reason I think that these things are first-class citizens:

Built-in (with possibility to add more):
- color palettes
- preferred styles and settings, eg. font families, styles and sizes; dark mode flag
- text/label formatters for common, frequently used values
- text, number and date formatting strategies incl. multilevel
User created (maybe persisted as saved objects):
- color scales
- text/label formatters for user-defined dimensions and measures
- eventually, arbitrary mappings (scales) eg. from customer Product code to color, marker shape/size or even Cartesian location (why couldn't an elastic-charts and a Vega visualization share an axis location and length, think of SPLOM and marginal scatterplots)
- Level of Detail for the user's dimensions, measures and I think, aggregation functions

All these are not trivial and a good amount of design is needed. Also, we shouldn't prohibit users from, or force users to go to a specific UI, or use some other affordance, to pick choices, assign colors etc. - there should be well thought out defaults, with the possibility of adding alternatives.

These mappings are not of the dashboard, or Canvas, or elastic-charts. They stand on their own right and can maybe even displayed as a dataviz-flavored stylebook. Other principles like common, centralized maintenance of shared things; DRY and theme support also calls for a single source of truth. Even Maps, Vega and user-rolled plugins could pick up data overlay colors from such functions and preferences; some of which may also be configured by Lens.

For practical reasons, would it work if

elastic-charts provides such mappings, scales, projections and dataviz-specific defaults as these already exist in an early form. This can be in a form of configuration (eg. JSONable objects) so it's serializable, but I feel that clients (eg. Dashboard) be able to query elastic-charts for accessor functions that answer questions like "give me all colors" or "format this number for very short text rendering (likely, axis tick)" or "format this categorical value with a medium long text" (eg. for labels of horizontal bars) or even "map a number to a screen pixel value for a barchart bar midpoint"
Dashboard, Canvas etc. would initially rely on elastic-charts for these, but could provide the goodies for its own constituents, including non-ech charts
As we gain experience, we'd consider at some point if these projections should live on their own, eg. a Kibana thing, or a sub-Kibana repo, like a sibling to elastic-charts
Eventually, plugins could provide new or alternative mappings; users may even create plugins with the sole purpose of adding specialized projections

Its also possible to start with point 3. I'm sure that similar precursor structures exist in Kibana already, it's just that current related efforts mostly go into elastic-charts.

Minor note: cohesive, uniform, combinatorial tools like Tableau don't have this problem to the extent we do. But with a set of historically formed visualization tools, Vega, Maps and and the encouragement of end user plugins, the location for these is not as clear-cut (Tableau has plugins too but there's a clearer balance of power between central Tableau and plugins while with us, newer and newer things are more egalitarian). But I'm sure Tableau and similar tools have such internal components with tight cohesion and low coupling.

Here are other examples, where colors and Cartesian projections are shared:

Any thoughts on this? Is there general consensus that at least a good chunk of mappings should be their own thing, shareable across Kibana, rather than the implementation detail of one specific module? @markov00 maybe you have a lot more history and insight here

dsmith001 commented 4 years ago

@monfera - my 2 cents is that making the mappings independent of any one visual object, and ideally shareable across all of Kibana goes along way towards enabling newer and less technical users which is obviously a huge goal for Lens.

To add some color (sorry, can't help myself) to your Tableau references, mappings like date formats, colors, etc. were tied to the meta-data layer Tableau automatically created on-top of physical data assets captured in what they called a .tds file. Specifically to this conversation about things like color selections and to your examples of shared color mapping - you're spot on in how users leverage this capability. Gif example below shows how choosing specific colors for a dimension in the first visual is automatically replicated for the user in the second (and all subsequent) visuals where that dimension is used. Tableau color mappings

The advantages to these mappings actually had some very cool benefits at both the individual analysis level as well as the team/dept/company level:

For individuals it largely meant simple time savings. I defined my colors/date formats/whatever once and then as I continued building the analysis in the same initial ad hoc journey OR came back later to edit/update/improve, there was no requirement to recreate those mappings for net new visuals.

For Teams/Dept/Companies it meant that central BI/Reporting groups could create and optionally lockdown meta-data related to not just colors, but hierarchies, labels, calculations and then provide centralized access to a physical data asset only through the sanctioned .tds. In this way they not only offered guidance to analysts with edited easy to read fields, but also ensured that no one was redefining how company revenue was being calculated or what the product hierarchy was, etc. Specifically back to colors, for corporate branding it was a way for centralized BI teams to allow self-service ad hoc analytics to other teams and depts, while still providing brand standards.

To bring this back to the discussion at hand, while I'm still ramping up on the ins and outs of the Elastic Stack, it seems like Index Patterns are the closest thing to a Tableau .tds, however they are leveraged for mostly non-visual specific scenarios so I'm unclear if trying to house these mappings there makes sense. Obviously as you pointed out the path for Elastic is much more complicated than what Tableau had to innovate around, however I think your ask for having these mappings be independent achieves the same flexible outcome that Tableau was able to achieve that helped support both visualization builders and visualization/data managers.

(sidenote: I know at EAH there were several asks to see how Tableau specifically handled certain scenarios like this and others. I'm always happy to offer my services demonstrating Tableau or Power BI if there is interest. I can also arrange the same for Looker).

wylieconlon commented 4 years ago

While this is a good discussion, it's not focused on the issue in the title. I want to focus just on date display because there are some low-effort ways to improve it without building a bigger system.

If you want to discuss how to separate Kibana user preferences vs general chart preferences, how about opening a new issue?

ghudgins commented 3 years ago

working on the core of this problem area in elastic-charts - https://github.com/elastic/elastic-charts/issues/1310

ghudgins commented 2 years ago

+1 - someone who wants hh:mm for a specific table visualization but still wants the histogram to be a date histogram (and not an "hour of day" visualization)

markov00 commented 1 year ago

I think we can close this, the multilayer time axis is available and implemented

elastic / kibana

[Lens] Improve date formatting by using a context-aware formatter like D3 #51227