Caleydo / tourdino

Calculate and visualize similarity measures.
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Switch to Vega for visualizations #12

Open keckelt opened 4 years ago

keckelt commented 4 years ago

Hi, with the transition to Vega (Lite) we can replace the charts used by Tourdino as well.

There is:

Apart from the parallel sets, it should be straight forward.

keckelt commented 3 years ago

I've started with the Boxplot today using vega lite v4. v5 is already out and I wanted to upgrade it directly but that caused some build errors in Coral I didn't know how to fix right away. So in order to have Coral & TourDino use the same vega versions I sticked to v4 for now.

keckelt commented 3 years ago

Examples for space saving y-axis might be worth a try: image

https://vega.github.io/editor/#/examples/vega-lite/bar_axis_space_saving

keckelt commented 3 years ago

I had a thought about the GENIE dataset and the many categories some attributes contain. Additionally, the GENIE dataset is about 10 times larger than the TCGA.

I'll focus on TourDino visualizations here, but of course the dataset size also impacts the statistics part.

Row Comparison

In addition to the visualizations, selecting the sets/row to compare also suffers from the many categories. There are all listed in the select2 input. It is searchable, but manual browsing becomes impractical.

Significance Matrix

If you select to compare all Tumortypes with each other, the significance matrix gets very wide and likely impossible to navigate. Its already a problem with the ~30 tumortypes. The row headers are not sticky. You also spawn a lot of comparisons. Due to the large datasize, results tend to be significant, although not necessarily relevant - due to a low effect size.

Numerical ↔ Numerical

As you just take the numerical values per set, regardless of the categories within it, this should not be an issue. Also, the boxplots pretty much abstract away the dataset size.

image

Categorical ↔ Categorical

For categorical, we could use a Grouped Relative Histogram, similar to what we have in Coral.

image

Open the Chart in the Vega Editor

In Coral, we sort by the total percent per category. Alphabetically would also be an option, or based on the differences of sets, which could be shown by a separate negative bar:

image Open the Chart in the Vega Editor

Column Comparison

Numerical ↔ Numerical

The large amount of marks could be an issue for Scatterplots (also to get the opacity right). Alternatively we could create a heatmap: image

Open the Chart in the Vega Editor

Optionally, with superimposed trend line.

Categorical ↔ Categorical

I would replace the parallel sets with a heatmap. As with the significance matrix, the attribute with more categories should have the categories as rows. For two attributes with many categories it is still problematic.

Categorical ↔ Numerical

A linechart with 5 lines corresponding to the Top 5 enrichment scores. With mroe categories, this should probably be adjustable.

keckelt commented 2 years ago

Related issue: https://github.com/Caleydo/cohort/issues/585