MissionBio / mosaic

Tertiary analysis package to visualize single-cell data generated by the Tapestri Platform.
17 stars 6 forks source link

Color scale not consistent between plots #19

Open bengouts opened 5 months ago

bengouts commented 5 months ago

Hi all,

I observed a very misleading behaviour with the color scale : the default color scheme is different from one plot to another, which is extremely misleading.

h5path = '/home/user/Download/4-cell-lines-AML-multiomics.dna+protein.h5'
sample = ms.load(
    h5path,
    raw=False,
    filter_variants=True,
    filter_cells=False,
    single=True
)
sample.protein.normalize_reads(method='NSP')
sample.protein.run_umap(attribute='normalized_counts', output_label="UMAP_PROT")
sample.protein.cluster(attribute='normalized_counts', method='graph-community', k=100)

Then I plot the heatmap :

sample.protein.heatmap(attribute='normalized_counts')

image

Then I plot the UMAP :

sample.protein.scatterplot(attribute='UMAP_PROT', colorby=sample.protein.get_labels())

image

For instance, Cluster 1 is blue in the heatmap and red in the UMAP, which is extremely misleading.

Thanks for this very usefull and well documented library. Best regards, Benoit

KKJSP commented 5 months ago

@bengouts To ensure that the same colors are used in the scatterplot you have to pass "label" to colorby instead of the labels.

sample.protein.scatterplot(attribute='UMAP_PROT', colorby="label")

Quoting the documentation:

In case ‘label’ is provided then the stored paltte is used. If the values are strings, then a discrete color map is assumed. For numerical values a continuous color scale is used.

bengouts commented 5 months ago

Oups, my mistake. I assumed it used the labels by default. Thanks!

bengouts commented 5 months ago

Just one other example to higlight that default behaviour of color scale is misleading :

sample.protein.cluster(attribute='normalized_counts', method='graph-community', k=100)
sample.protein.add_row_attr("cluster_graph_community", sample.protein.get_labels())
sample.protein.heatmap(attribute='normalized_counts', splitby=sample.protein.row_attrs["cluster_graph_community"])
sample.protein.scatterplot(attribute='UMAP_PROT', colorby=sample.protein.row_attrs["cluster_graph_community"])
KKJSP commented 5 months ago

As of now the only to ensure that all plots have the same color is to store the value in the label row attribute of the assay by using sample.protein.set_labels(sample.protein.row_attrs["cluster_graph_community"]) and then passing "label" to the colorby or splitby parameters. The colors can be modified using sample.protein.set_palette() as well. Whenever an array is passed to these parameters, the function has no way of knowing what that array was supposed to indicate.

However I agree that mosaic should use a consistent palette for any given array across all the plots. We can keep this ticket open.