DingWB / PyComplexHeatmap

PyComplexHeatmap: A Python package to plot complex heatmap (clustermap)
https://dingwb.github.io/PyComplexHeatmap/
MIT License
257 stars 28 forks source link

How to export the clustering? #2

Closed thistleknot closed 1 year ago

thistleknot commented 1 year ago

I see examples to show a cluster graph, but what if we need the derived clusters? What object should be called?

DingWB commented 1 year ago

Thanks for your question. If you take a look into the source code: https://github.com/DingWB/PyComplexHeatmap/blob/994839b40fe0507ab0bdedb77d96d9a1c2804b66/PyComplexHeatmap/clustermap.py#L2119 You will find that you can call attributes row_order and col_order to get the clustering. Let's take the following example:

import os,sys
import PyComplexHeatmap
from PyComplexHeatmap import *
%matplotlib inline
import matplotlib.pylab as plt
import pickle
plt.rcParams['figure.dpi'] = 100
plt.rcParams['savefig.dpi']=300

#Generate an example dataset (random)
df = pd.DataFrame(['AAAA1'] * 5 + ['BBBBB2'] * 5, columns=['AB'])
df['CD'] = ['C'] * 3 + ['D'] * 3 + ['G'] * 4
df['EF'] = ['E'] * 6 + ['F'] * 2 + ['H'] * 2
df['F'] = np.random.normal(0, 1, 10)
df.index = ['sample' + str(i) for i in range(1, df.shape[0] + 1)]
df_box = pd.DataFrame(np.random.randn(10, 4), columns=['Gene' + str(i) for i in range(1, 5)])
df_box.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar = pd.DataFrame(np.random.uniform(0, 10, (10, 2)), columns=['TMB1', 'TMB2'])
df_bar.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_scatter = pd.DataFrame(np.random.uniform(0, 10, 10), columns=['Scatter'])
df_scatter.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_heatmap = pd.DataFrame(np.random.randn(50, 10), columns=['sample' + str(i) for i in range(1, 11)])
df_heatmap.index = ["Fea" + str(i) for i in range(1, df_heatmap.shape[0] + 1)]
df_heatmap.iloc[1, 2] = np.nan

plt.figure(figsize=(6, 12))
row_ha = HeatmapAnnotation(label=anno_label(df.AB, merge=True,rotation=15),
                           AB=anno_simple(df.AB,add_text=True),axis=1,
                           CD=anno_simple(df.CD,add_text=True),
                           Exp=anno_boxplot(df_box, cmap='turbo'),
                           Scatter=anno_scatterplot(df_scatter), TMB_bar=anno_barplot(df_bar),
                           )
cm = ClusterMapPlotter(data=df_heatmap, top_annotation=row_ha, col_split=2, row_split=3, col_split_gap=0.5,
                     row_split_gap=1,label='values',row_dendrogram=True,show_rownames=True,show_colnames=True,
                     tree_kws={'row_cmap': 'Dark2'})
plt.show()

If you want to export the derived row clusters, you can call cm.row_order

print(cm.row_order)

# if row_split = None, you can also use the following method to get the row orders:
#print(cm.dendrogram_row.dendrogram['ivl'])

[['Fea10', 'Fea35', 'Fea21', 'Fea45', 'Fea11', 'Fea5', 'Fea23', 'Fea2', 'Fea19', 'Fea29', 'Fea3', 'Fea12', 'Fea4', 'Fea14', 'Fea32', 'Fea16', 'Fea40', 'Fea30', 'Fea28'], ['Fea22', 'Fea49', 'Fea20', 'Fea48', 'Fea26', 'Fea27', 'Fea44', 'Fea38', 'Fea37', 'Fea50', 'Fea24', 'Fea17'], ['Fea47', 'Fea1', 'Fea46', 'Fea42', 'Fea33', 'Fea43', 'Fea31', 'Fea9', 'Fea36', 'Fea15', 'Fea34', 'Fea8', 'Fea6', 'Fea7', 'Fea13', 'Fea25', 'Fea41', 'Fea18', 'Fea39']]

cm.row_order is a list, and each element in cm.row_order is also a list. I hope this answer is helpful to you. Thanks again for your question.