erdogant / pca

pca: A Python Package for Principal Component Analysis.
https://erdogant.github.io/pca
MIT License
284 stars 42 forks source link

Biplot `n_features` has no effect #3

Closed Rendiere closed 4 years ago

Rendiere commented 4 years ago

I'm recreating figure 10.1 from the book Introduction to Statistical Learning with this library, specifically creating a 2-d biplot from the USA Arrests dataset.

However, when creating a biplot only the first 2 loading vectors are displayed irrespective of what I pass to n_features. In addition, although a separate issue, the loading vector label is plotted outside the scale of the plot.

From a quick scan of the code it looks like the issue is in the compute_topfeat method, where n_feat never gets taken into account, but rather n_pcs gets iterated over twice.

P.S - great work on this library. Exactly what I was looking for when googling "PCA biplots python". For that reason, I wouldn't mind helping out with maintaining this library if needs be.

erdogant commented 4 years ago

Dear Rendiere,

Its great to read your enthusiasm using the pca library! Thank you for pointing out these issues. I fixed the length of the arrow, number of features and included some of the missing docstrings.

Update the library with: pip install -U pca The version should be >= 1.07. Check the version with:

import pca
pca.__version__

Example on the arrest dataset:

from pca import pca
import pandas as pd

model = pca(normalize=True)
# Dataset
df = pd.read_csv('usarrest.txt')
# Setup dataset
X = df[['Murder','Assault','UrbanPop','Rape']].astype(float)
X.index = df['state'].values

# Fit transform
out = model.fit_transform(X)
out['topfeat']

# Make plot
ax = model.biplot(n_feat=4, legend=False)
ax = model.biplot3d(n_feat=4, legend=False)
Rendiere commented 4 years ago

Hi @erdogant , thanks for the speedy turnaround! Can confirm that the fixes work for me as well. Love the added colours too.