Rosemeis / pcangsd

Framework for analyzing low depth NGS data in heterogeneous populations using PCA.
GNU General Public License v3.0
46 stars 11 forks source link

Negative eigenvalues #55

Open elizeng opened 3 years ago

elizeng commented 3 years ago

I have a dataset with 5 samples and when I conduct pcangsd, I am unable to properly calculate the proportion of variation for each PC due to the presence of a negative eigenvalue.

>eigen:
$values
[1]  0.6744819  0.1861528  0.1701493  0.1116181 -0.8495392

$vectors
           [,1]        [,2]        [,3]        [,4]       [,5]
[1,]  0.4300537  0.40136852  0.03084264  0.68385914 -0.4305143
[2,]  0.4199545  0.33469631  0.06889405 -0.72819586 -0.4202392
[3,]  0.3240840 -0.83952111 -0.11967661  0.03105326 -0.4181950
[4,] -0.5223017  0.13639072 -0.68877390 -0.02478187 -0.4832959
[5,] -0.5107470 -0.05902176  0.71103471  0.02211366 -0.4791602

>eigen$values/sum(eigen$values)
[1]  2.3030629  0.6356312  0.5809860  0.3811275 -2.9008076

It adds up to 100% due to the negative value but it doesn't make sense in percentages. Anyone has any advice on what to do with such a dataset?

I do not have the same issue when using plink to plot my pca, but I believe the algorithm used is different.

Rosemeis commented 3 years ago

Hi,

I suspect it is due to missingness and the very low number of samples that are probably not suitable for PCA. You could simply treat the negative value as 0, even though it would not be entirely correct.

Best, Jonas