ajverster / FastPCA

This implements PCA quickly using a random subspace approximate SVD
2 stars 1 forks source link

Proportion of Variance : error? #1

Open JeremyTournayre opened 5 years ago

JeremyTournayre commented 5 years ago

Hello,

I dont understand why the Cumulative Proportion don't match the Proportion of Variance? On the example :

                         PC1       PC2       PC3       PC4       PC5

Standard deviation 571.77091 488.70674 452.77183 422.91619 401.35474 Proportion of Variance 0.12649 0.09241 0.07932 0.06920 0.06232 Cumulative Proportion 0.09520 0.16475 0.22445 0.27653 0.32344

Do you think that the cumulative proportion and the proportion of variance should look like this :

                        PC1       PC2       PC3       PC4       PC5

Standard deviation 573.5588 490.38590 452.17670 423.20886 403.26458 Proportion of Variance 0.0958 0.07003 0.05954 0.05216 0.04736 Cumulative Proportion 0.0958 0.16583 0.22537 0.27752 0.32488

In the script :

total.var <- sum(apply(Df,2,var))

vars <- object$sdev^2 importance <- rbind(Standard deviation = object$sdev, Proportion of Variance = round(vars / sum(vars), 5), Cumulative Proportion = round(cumsum(vars) / total.var, 5)) colnames(importance) <- colnames(object$rotation)

print(sprintf("Total amount of variance explained is %f",sum(vars) / total.var))

Why do you use " sum(vars)" in Proportion of Variance = round(vars / sum(vars), 5) instead of "total.var" ?

I really appreciate any help you can provide.

ajverster commented 5 years ago

Hi Jeremy,

I'll take a look at this in detail later, but I suspect you found a bug in my code. I wrote this fairly quickly as an exercise a while back.

Adrian

On Fri, Dec 7, 2018 at 9:33 AM JeremyTournayre notifications@github.com wrote:

Hello,

I dont understand why the Cumulative Proportion don't match the Proportion of Variance? On the example :

                     PC1       PC2       PC3       PC4       PC5

Standard deviation 571.77091 488.70674 452.77183 422.91619 401.35474 Proportion of Variance 0.12649 0.09241 0.07932 0.06920 0.06232 Cumulative Proportion 0.09520 0.16475 0.22445 0.27653 0.32344

Do you think that the cumulative proportion and the proportion of variance should look like this :

                    PC1       PC2       PC3       PC4       PC5

Standard deviation 573.5588 490.38590 452.17670 423.20886 403.26458 Proportion of Variance 0.0958 0.07003 0.05954 0.05216 0.04736 Cumulative Proportion 0.0958 0.16583 0.22537 0.27752 0.32488

In the script :

total.var <- sum(apply(Df,2,var))

vars <- object$sdev^2 importance <- rbind(Standard deviation = object$sdev, Proportion of Variance = round(vars / sum(vars), 5), Cumulative Proportion = round(cumsum(vars) / total.var, 5)) colnames(importance) <- colnames(object$rotation)

print(sprintf("Total amount of variance explained is %f",sum(vars) / total.var))

Why do you use " sum(vars)" in Proportion of Variance = round(vars / sum(vars), 5) instead of "total.var" ?

I really appreciate any help you can provide.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ajverster/FastPCA/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/ADvTXqOXxzJZ0_V4L2SWbujvUHdRPsNTks5u2nxOgaJpZM4ZIjFw .