Open keckelt opened 4 years ago
I've started with the Boxplot today using vega lite v4. v5 is already out and I wanted to upgrade it directly but that caused some build errors in Coral I didn't know how to fix right away. So in order to have Coral & TourDino use the same vega versions I sticked to v4 for now.
Examples for space saving y-axis might be worth a try:
https://vega.github.io/editor/#/examples/vega-lite/bar_axis_space_saving
I had a thought about the GENIE dataset and the many categories some attributes contain. Additionally, the GENIE dataset is about 10 times larger than the TCGA.
I'll focus on TourDino visualizations here, but of course the dataset size also impacts the statistics part.
In addition to the visualizations, selecting the sets/row to compare also suffers from the many categories. There are all listed in the select2 input. It is searchable, but manual browsing becomes impractical.
If you select to compare all Tumortypes with each other, the significance matrix gets very wide and likely impossible to navigate. Its already a problem with the ~30 tumortypes. The row headers are not sticky. You also spawn a lot of comparisons. Due to the large datasize, results tend to be significant, although not necessarily relevant - due to a low effect size.
As you just take the numerical values per set, regardless of the categories within it, this should not be an issue. Also, the boxplots pretty much abstract away the dataset size.
For categorical, we could use a Grouped Relative Histogram, similar to what we have in Coral.
Open the Chart in the Vega Editor
In Coral, we sort by the total percent per category. Alphabetically would also be an option, or based on the differences of sets, which could be shown by a separate negative bar:
Open the Chart in the Vega Editor
The large amount of marks could be an issue for Scatterplots (also to get the opacity right). Alternatively we could create a heatmap:
Open the Chart in the Vega Editor
Optionally, with superimposed trend line.
I would replace the parallel sets with a heatmap. As with the significance matrix, the attribute with more categories should have the categories as rows. For two attributes with many categories it is still problematic.
A linechart with 5 lines corresponding to the Top 5 enrichment scores. With mroe categories, this should probably be adjustable.
Related issue: https://github.com/Caleydo/cohort/issues/585
Hi, with the transition to Vega (Lite) we can replace the charts used by Tourdino as well.
There is:
Apart from the parallel sets, it should be straight forward.