elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.71k stars 8.13k forks source link

Beyond word clouds #95912

Closed markharwood closed 7 months ago

markharwood commented 3 years ago

Word cloud visualizations are nice eye candy but for practical use have a number of issues. They have been called "the pie chart of text data" and I find it hard to disagree.

Making sense of popular/significant terms can be greatly improved if we also provide a degree of clustering in the visualization. As a real-world and topical example, here are the significant words generated from today's news headlines and rendered as a typical word cloud (using this):

Word_Cloud_Generator

The user is left wondering if Joe Biden's dog has anything to do with the Suez Canal and if Deliveroo drivers have been involved in a biting incident. If we use the adjacency matrix aggregation we can cluster these same terms by their co-occurrence and use a Graph visualization to give a much more useful summary of today's news:

Kibana-52

We can clearly see that it was Biden's dog in the biting incident and that it was the ever given megaship stuck in the Suez canal. In my prototype these relationship lines that connect terms can also be clicked and a highlighter can be used to show where the connected terms were used in the original text:

Kibana

This style of interaction helps users quickly remove the mystery by providing the missing context. Even if we don't adopt a graph visualization, the clusters produced by the adjacency matrix aggregation can be of use in colouring words based on the clusters they sit in.

It's also worth mentioning again that text fields are currently not supported in word cloud visualizations and the significant_text aggregation was specifically designed for producing these sorts of word discoveries from text fields, with special support for eliminating junk words from noisy text.

elasticmachine commented 3 years ago

Pinging @elastic/kibana-app (Team:KibanaApp)

monfera commented 3 years ago

Agree, esp. if there's a need to show relations and not just allude to term frequencies, and there's enough space for the links and proximity layout to work. Word cloud is indeed a bit like the pie chart, can even be circular :-) https://github.com/elastic/elastic-charts/pull/1038

markharwood commented 3 years ago

Agree, esp. if there's a need to show relations and not just allude to term frequencies,

Even for the simpler case of plain lists a bar chart can be clearer, as noted in that article.

However we can do better than plain bar charts when it comes to lists of significant terms found in query results. They are significant because they have seen an uptick in popularity for the selected query (e.g. trending in today's news). When it comes to conveying the popularity there are different scales at play between terms. Evergreen topics like "Meghan Markle" are often in the news but on the day of the Oprah Winfrey interview there's an uplift. A very minor celebrity would have significantly fewer mentions but would trend on the occasion of their death or caught saying something racist. Perhaps the important measure is the significance score e.g. the percentage of their mentions that occur in the search results. This can be shown in one scaleable bar chart - the green bar represents the number of matches in the search results and the grey bar the number of matches outside of the search results (the background popularity):

Large GIF (912x564)

Everything is drawn to scale and the zoom bar can be used to reveal details of minor celebrities and the percentage of their mentions that occur in the search results. This style of interface shows all the possible stats of interest: 1) Meyers Leonard is more popular than Oprah in today's news (one green bar bigger than the other) 2) Oprah is more popular than Meyers normally (one grey bar bigger than other grey) 3) 80% of all Meyer's mentions in the news happened today (lots of green vs grey visible when zoomed in) 4) Expanding your query with Meghan (as an OR) will drastically increase the number of matching results (lots of grey)

Currently word clouds use one stat to size words and any comparisons are hard because long words use more space than short words.

stratoula commented 7 months ago

Thank you for contributing to this issue, however, we are closing this issue due to inactivity as part of a backlog grooming effort. If you believe this feature/bug should still be considered, please reopen with a comment.