erdogant / pca

pca: A Python Package for Principal Component Analysis.
https://erdogant.github.io/pca
MIT License
284 stars 42 forks source link

Unable to set colors for labels #40

Closed koutoftimer closed 1 year ago

koutoftimer commented 1 year ago

Problem

model.biplot(
    y=df['target'],  # list of 'Y' and 'N' values or any other qualifiers
    cmap=mpl.colors.ListedColormap(['red', 'green']),
)
  1. You can't set 'Y' to be green and 'N' to be red. They can swap colors depending on the data.
  2. If there is only 'Y's or only 'N's. The color is black.

Current approach is kind of okay if you want to distinguish different categories but it doesn't allows consistency across different datasets.

I haven't found a workaround.

UPD: https://github.com/erdogant/pca/compare/master...koutoftimer:pca:master it is not good by any chance, but, at least, it works.

erdogant commented 1 year ago

Can you show the plot with your approach?

koutoftimer commented 1 year ago

out20-3.pdf This is how output looks like. No black colors and no color interchanges across all plots.

Invoked somehow like this:

RED = [1., 0., 0.]
GREEN = [0., 1., 0.]

model.biplot(
    y=df['target'],  # list of 'Y' and 'N' values or any other qualifiers
    fixed_colors={'Y': GREEN, 'N': RED},
)
erdogant commented 1 year ago

I created an update to manually specify colors.

Here again, can you install first from the github source? pip install git+https://github.com/erdogant/pca

The input parameters should be straightforward

from sklearn.datasets import load_iris
import pandas as pd
from pca import pca
import matplotlib as mpl
import colourmap

y=load_iris().target

# Initialize
model = pca(n_components=3, normalize=True)
# Dataset
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=y)
# Fit transform
out = model.fit_transform(X)
# plot manually specified colors where c is a list of RGB colors with the same size as the number of samples.
c = colourmap.fromlist(load_iris().target, cmap='Set2')[0]
c[0] = [0,0,0]

model.biplot(c=c, legend=False, label=False)

image

# In case all class labels are the same, still use the cmap colors if provided.
y1 = np.repeat(0, len(y))
model.biplot(y=y1, cmap=mpl.colors.ListedColormap(['green', 'red', 'blue']))

image

# Color on classlabel (Unchanged)
model.biplot()
# Use cmap colors for classlabels (unchanged)
model.biplot(y=load_iris().target, cmap=mpl.colors.ListedColormap(['green', 'red', 'blue']))
# Do not show points when cmap=None (unchanged)
model.biplot(y=load_iris().target, cmap=None)
# Plot all points as unique entity (unchanged)
model.biplot(y=None, legend=False, label=False)
koutoftimer commented 1 year ago

IDK, maybe this issue should be closed. I really have no motivation for it right now and looks like you didn't get it.

I'm not sure, but I believe that cmap is not direct mapping of colors. If you have labels A, B, C, then A will be first color, B - second, etc. If you have labels B, C, D, then B will be first color. This way you are loosing consistency is you want B to be the same color across all the plots.

That is why I'm using fixed_colors parameter as you can see in the very first message (update part).

erdogant commented 1 year ago

Ok I am closing this issue. Note that the input parameter c is to specify the color of each sample individually. Thus the colors can be adjusted exactly how you want now.