DingWB / PyComplexHeatmap

PyComplexHeatmap: A Python package to plot complex heatmap (clustermap)
https://dingwb.github.io/PyComplexHeatmap/
MIT License
249 stars 28 forks source link

update anno_img: add a parameter merge and merge_width. assert if all the images have the same shape. #66

Closed dakomura closed 4 months ago

dakomura commented 4 months ago

Hi Wubin,

Thanks for merging the pull request and adding the use of anno_img to the documentation. I appreciate your feedback and suggestions.

Regarding the image size issue, my approach assumes that all images are exactly the same size. If this is not the case, the current code will indeed encounter errors during the np.hstack or np.vstack operations. To address situations where images may vary in size and aspect ratio, implementing a comprehensive solution becomes complex. Therefore, I decided to use assertions to enforce that images within the same cluster must have identical sizes.

https://github.com/dakomura/PyComplexHeatmap/blob/951ecb9f15d765bdd3f811dd228d8c1e7cb744ac/PyComplexHeatmap/annotations.py#L1149-L1155

As for the merge parameter, while the need to merge images is rare, there are scenarios where it could be beneficial to display a single image for a particular category or cluster. Since the implementation was straightforward, I included this feature.

https://github.com/dakomura/PyComplexHeatmap/blob/951ecb9f15d765bdd3f811dd228d8c1e7cb744ac/PyComplexHeatmap/annotations.py#L1163-L1169

df = pd.DataFrame(['AAAA1'] * 5 + ['BBBBB2'] * 5, columns=['AB'])
df['CD'] = ['C'] * 3 + ['D'] * 3 + ['G'] * 4
df['F'] = np.random.normal(0, 1, 10)
df.index = ['sample' + str(i) for i in range(1, df.shape[0] + 1)]
df_box = pd.DataFrame(np.random.randn(10, 4), columns=['Gene' + str(i) for i in range(1, 5)])
df_box.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar = pd.DataFrame(np.random.uniform(0, 10, (10, 2)), columns=['TMB1', 'TMB2'])
df_bar.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_scatter = pd.DataFrame(np.random.uniform(0, 10, 10), columns=['Scatter'])
df_scatter.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar1 = pd.DataFrame(np.random.uniform(0, 10, (10, 2)), columns=['T1-A', 'T1-B'])
df_bar1.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar2 = pd.DataFrame(np.random.uniform(0, 10, (10, 2)), columns=['T2-A', 'T2-B'])
df_bar2.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar3 = pd.DataFrame(np.random.uniform(0, 10, (10, 2)), columns=['T3-A', 'T3-B'])
df_bar3.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar3.iloc[5,0]=np.nan
df_bar4 = pd.DataFrame(np.random.uniform(0, 10, (10, 1)), columns=['T4'])
df_bar4.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar4.iloc[7,0]=np.nan
df_img = pd.DataFrame([f"{i:02}.jpg" for i in range(1,11)], columns=['path'])
df_img.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_heatmap = pd.DataFrame(np.random.randn(30, 10), columns=['sample' + str(i) for i in range(1, 11)])
df_heatmap.index = ["Fea" + str(i) for i in range(1, df_heatmap.shape[0] + 1)]
df_heatmap.iloc[1, 2] = np.nan

plt.figure(figsize=(12, 16))

#df_img_col = pd.DataFrame([f"{i:02}.jpg" for i in range(10)], columns=['Image'])
df_img_col = pd.DataFrame([f"01.jpg" for i in range(10)], columns=['Image'])
df_img_col.index = ['sample' + str(i) for i in range(1, 11)]
#df_img_row = pd.DataFrame([f"{i:02}.jpg" for i in range(30)], columns=['Image'])
df_img_row = pd.DataFrame([f"02.jpg" for i in range(30)], columns=['Image'])
df_img_row.index = ["Fea" + str(i) for i in range(1, df_heatmap.shape[0] + 1)]

col_ha = HeatmapAnnotation(Image=anno_img(df_img_col,  height=16, border_width=10, merge=True, merge_width=1))#, axis=1)
row_ha = HeatmapAnnotation(Image=anno_img(df_img_row, border_width=13, merge=True, merge_width=2, height=20), axis=0)

cm = ClusterMapPlotter(data=df_heatmap, top_annotation=col_ha, left_annotation=row_ha, col_split=2, row_split=3, col_split_gap=10,
                     row_split_gap=10,label='values',row_dendrogram=True,show_rownames=True,show_colnames=True,
                     tree_kws={'row_cmap': 'Dark2'},cmap='Spectral_r',
                       legend_gap=5,legend_hpad=2,legend_vpad=5)
plt.show()
merge_example

I will be sure to send you the Jupyter notebook when it is finished and contains a better example of anno_img used with real biological data. This example will be suitable for inclusion in the documentation website.

Best regards,

Daisuke

DingWB commented 4 months ago

Awesome, thank you very much. @dakomura I really appreciated it.

DingWB commented 2 months ago

Hi @dakomura ,

I made some big changes to the anno_img: (1). It supports remote URL as input. (2). It supports missing values now. (3). Fix bugs about the order and rotation of rows and columns image annotations. Please see here for some example

Could you please install the latest version from github and help me test those new features?

dakomura commented 2 months ago

Hi @DingWB ,

I apologize for not making any progress on the examples. I've been extremely busy lately and haven't had the opportunity to work on them.

Thank you so much for implementing those new features in anno_img! I sincerely appreciate your effort. I'll make sure to install the latest version from GitHub and thoroughly test it next week. If I have any feedback or suggestions, I'll be sure to let you know.