Ideas for an alternative representation

JEFworks-Lab / scatterbar

Scatterbar - data visualization for proportional data across many spatially resolved coordinates

GNU General Public License v3.0

2 stars 0 forks source link

I came across the preprint for this package and have some thoughts on alternative representations. I think the premise that a bar chart is easier to read than a pie chart is not particularly convincing in this application. It feels to me that the bar chart suffers a similar weakness as the pie chart, in that there is simply far too much information to parse. Suppose this were printed on a poster, do we expect anyone to carefully interpret each of these small datapoints, much less try to compare the size of slices between points that are not adjacent? In addition, the fact that the individual small charts fill the square makes it more difficult to distinguish between adjacent squares.

In my mind, the plot should capture higher level information at a glance, while more detailed comparisons should be left to further plots and tables. I haven't done much analysis in this area, but I've always scratched my head at the scatter pies when I see them in talks. I think they should be reorganized in order to highlight more important take-aways, and you may have different thoughts coming from experience.

What is the most common cell type, and roughly what proportion does it take up?
How diverse is this population?

I would like to see something like this: https://stackoverflow.com/questions/55522850/recreate-circular-diagram-with-ggplot2, a circularized bar-plot, sorted for cell type proportion, either the most common type in the center or at the end. This should make it easier to identify the most common cell type, the downside is that the relative area occupied by each cell type becomes more difficult to interpret, the outside slices would take more area when occupying the same amount of diameter space, this can be fixed by using the square of proportions as diameter, but I don't know how well people can visually judge distances of concentric rings. Something to experiment with I guess.

Alternatively the sorting can be applied to the existing plots. The main thing is that I think the visualizations should be guided by the questions of "What are the main cell types in this population, and how much proportion do they roughly take up?" rather than thinking about specific comparisons of some cell-type between two populations, which is almost impossible to compare for middle-slices whether you use pie or stacked bar.

Dear Shian,

Thanks for sharing.

Previously studies have found that different data encodings for different data types offer differing levels of salience. In particular, people have a harder time discerning between angles (how quantitative data is encoded in pie charts) compared to lengths (how quantitative data is encoded in bar charts). Of course, I agree in such scattered pie charts and bar charts, people would also be asked to discern other spatial information as well, adding to the complexity of the data visualization and interpretation task.

Circularized bar-plots is an interesting idea but, again based on previous studies, I anticipate will suffer from interpretation challenges as it encodes quantitative data using area, which is generally even harder for people to discern and compare. Previous literature including Munzner, Visualization Analysis and Design 2014 and Mackinlay, Automating the Design of Graphical Presentations of Relational Information 1986 may be of interest to you if you are looking to understand how well people can visually judge distances of concentric rings for example.

Likewise, if I was looking to create a data visualization to summarize the diversity of the population at each spatial location, I would personally just compute an entropy metric at each spatial location and visualize that metric using color saturation for example. In general, it's important to keep in mind not every visualization will be optimal for exploring or communicating every take-away.

Reordering the stacked bars can be done as shown in this tutorial (https://jef.works/scatterbar/articles/getting-started-with-scatterbars.html) but currently the same order applies to all stacked bars.

Reordering each stacked bar individually to highlight the most common cell type is a very interesting enhancement feature. It would definitely be feasible to reorder each stacked bar chart based on the sorted cell type proportions with the most common cell type at the bottom or top. In terms of implementation details, it'd be ideal to have a new parameter, say reorder = FALSE (by default), that users can set as TRUE if they want to do such reordering. Coding-wise, it will be a matter or changing the internal data frame sorting if reorder=TRUE. If this is a feature useful to you and you'd be interested in helping implement, I'd be delighted to merge a pull request and add you as a contributor to the package.

All the best, Prof. Fan

Jean Fan, PhD Assistant Professor in Biomedical Engineering Center for Computational Biology Johns Hopkins University JEFworks Lab: jef.works

JEFworks-Lab / scatterbar

Ideas for an alternative representation #1