Closed faridrashidi closed 1 year ago
I agree with the first suggestion but I didn't get why you're trying to allow having separate colors for different columns. By "columns" do you mean the unique "values" of the data?
For example, if you have the following data frame:
[
[A, Tumor, Stage 2],
[B, Normal, Stage1],
[C, Tumor, Stage3],
[D, Normal, Stage 2]
]
We have 3 columns here, but we do not want the use the same colormap for those 3 columns. In this case the unique should not be [A,B,C,D,Tumor, Normal, Stage1, Stage2, Stage3]. We should use different unique for different columns and use different colormap (Set1, Dark2.)
I see, I think it's hard or no clean way to handle because then you have to add multiple legends.
Oh, actually. It has already been implemented in the current version. Please see the following example:
data = pd.DataFrame(
[
[0, 1],
[1, 0],
[0, 1],
[1, 0],
[0, 1],
]
)
data.columns=['Col1','Col2']
data.Col1=data.Col1.astype(str)
data.Col2=data.Col2.astype(str)
print(data)
plt.figure(figsize=(8, 4))
col_ha = HeatmapAnnotation(
Col1=anno_simple(data.Col1,add_text=True),
Col2=anno_simple(data.Col2,add_text=True),
plot=True,legend=True,axis=1,
legend_gap=5,orientation='up',hspace=0.1
)
plt.show()
Please refer to this documentation for more examples.
Ok. Regarding your suggestion (3), to clarify I assumed that if the cmap is a list, the data is categorical, and if it is a string, the data is continuous. However, I'm not entirely certain how to distinguish between continuous and categorical data if cmap is used as a string.
How about we recognize the dtypes automatically, using df.dtypes
, if str
is included in df.dtypes, then we treat the whole dataframe as categorical. Otherwise the dataframe should be treated as continuous.
For example, if you have the following data frame:
[ [A, Tumor, Stage 2], [B, Normal, Stage1], [C, Tumor, Stage3], [D, Normal, Stage 2] ]
We have 3 columns here, but we do not want the use the same colormap for those 3 columns. In this case the unique should not be [A,B,C,D,Tumor, Normal, Stage1, Stage2, Stage3]. We should use different unique for different columns and use different colormap (Set1, Dark2.)
I disagree with this. I use categorical data points in heatmaps, and do not wish to have per variable value legends. For example, in Breast Cancer, I need Positive/Negative for ER/PR/HER2 status. All I need is one color for positive and one for negative across the board. Someone that needs TumorStage2 to be different than NormalStage2 needs to have those 2 variables in the same data point or represent each of the varaibels differently, like ggplot would do - shape for one and color for the other, for example.
I see. Thanks for your feedback. Do you have any idea how to implement it?
I think one way would be to ensure the df is all categorical or all continuous, then if all categorical, flatten to a 1D array and get unique elements, then ensure all unique elements are accounted for in the legend/color mapping. Once that's done, plotting should be straightforward IMO.
I do not know python well (I don't use pandas or matplotlib) so I could well me missing something here.
I closed this pull request as I added a new module oncoPrint
to the latest version (1.3.9).
You can use oncoPrint to plot categorical data.
Following issue #25. As a test example: