DingWB / PyComplexHeatmap

PyComplexHeatmap: A Python package to plot complex heatmap (clustermap)
https://dingwb.github.io/PyComplexHeatmap/
MIT License
249 stars 28 forks source link

Suggestion for heatmap of categorical data #25

Closed faridrashidi closed 1 year ago

faridrashidi commented 1 year ago

Hi Wubin,

I noticed that PyComplexHeatmap currently treats categorical data the same way as continuous data, presenting the legend as a color bar instead of categorical labels. Do you have any plans to improve this feature? I came across a great repository called catheat (https://github.com/schlegelp/catheat) that uses seaborn to handle categorical data. It would be fantastic to see this integrated into PyComplexHeatmap.

DingWB commented 1 year ago

Hello Farid,

I appreciate your suggestion. That's a good direction. I will look into catheat, but I am unsure whether it will be useful cause we have different kinds of annotation (such as anno_boxplot, anno_barplot, and anno_simple) to display the categorical variables. In addition, if we display categorical variables in a heatmap, there will be too many rows in the figure legends. Not sure how easy it is to show the figure legend with too many categorical rows.

I will think about it. Or could you please help with this part if you are available? BTW, if you want to plot this kind of plot in the current version of PyComplexHeatmap, you can refer to this example

faridrashidi commented 1 year ago

Thank you Wubin for your quick reply. I guess the annotation does not require any modifications as my suggestion pertains only to the legend of the heatmap itself.

I think we just need to update line 322 in clustermap.py with a categorical-label-legend instead of color-bar as done in line 148 in catheat. Because they both use pcolormap to draw the heatmap.

Let me give it try in the following days and create a PR for your review.

DingWB commented 1 year ago

No, I think you are looking at a wrong place. You should look at plot_legend_list and plot_color_dict_legend in utils.py. This is the functions I used to plot all different kinds of legends. I guess you only need to prepare the input for the plot_color_dict_legend. You don't need to write a function and plot the legend from scratch.

faridrashidi commented 1 year ago

Thanks for your guidance. I now figured out. Only line 1260 in clustermap.py needs to be changed to the following:

if isinstance(self.cmap, list):
    if isinstance(self.data, (pd.DataFrame, pd.Series)):
        unique_values = sorted(np.unique(self.data.values.astype(str)))
    else:
        unique_values = sorted(np.unique(self.data.astype(str)))
    cmap = {v: k for v, k in zip(unique_values, self.cmap)}
    self.legend_list.append([cmap, self.label, {}, 4,'color_dict'])
else:
    self.legend_list.append([self.cmap, self.label, self.legend_kws, 4,'cmap'])

In addition data must become an attribute of the class ClusterMapPlotter

Having the two above changes, as an exmaple we can have:

import pandas as pd
from PyComplexHeatmap import *

data = pd.DataFrame([[0, 1], [1, 0], [0, 1], [1, 0], [0, 1]])
ClusterMapPlotter(
    data=data,
    linewidth=1,
    row_cluster=False,
    col_cluster=False,
    cmap=["blue", "red"],
)
DingWB commented 1 year ago

Good job! Could you please fork my latest code and change line 1260 and test it on your side? If everything is OK, please make a pull request.

faridrashidi commented 1 year ago

I'm closing this issue since oncoplot has been implemented. Nice work Wubin!

DingWB commented 1 year ago

Thanks, Farid. You are very welcome to give more suggestions.