biocore / mds-approximations

Multidimensional scaling algorithms for microbiology-ecology datasets.
6 stars 7 forks source link

Implement FSVD and its test #27

Closed HannesHolste closed 8 years ago

HannesHolste commented 8 years ago

Python algorithm implementation based on matlab reference: http://stats.stackexchange.com/questions/2806/best-pca-algorithm-for-huge-number-of-features

Please review :)

Output differs quite a bit from the reference :\

FSVD:

Eigvals 8
0.278337313173  0.145127399638  0.0915313772517 0.0709739671632 0.0653806249996 0.0496949148193 0.041021934741  0.036691758648

Proportion explained    8
0.35741122654   0.186357198457  0.117534876792  0.0911372333338 0.0839548571718 0.0638129335081 0.0526760133008 0.0471156608964

Species 0   0

Site    9   8
10086.PC.481    -0.054339557937 -0.0342552935393    -0.251406726004 -0.0639139098524    0.0751782499644 -0.00610890600446   0.0691443579795 -0.0328869509958
10086.PC.593    -0.117504841625 -0.240337343453 0.0284881811287 0.157039731006  -0.0655070432201    0.0117269342698 0.0675238219818 0.0246031419646
10086.PC.356    -0.23903254296  0.132867446761  0.0945085311408 -0.0134091752499    0.0806428229559 -0.125138210893 0.0357851117413 0.0683071878735
10086.PC.355    -0.139856125419 0.183478001865  0.0317600825265 0.0514380167531 0.0315979996486 0.176618224511  -0.00222854022115   -0.0152780749135
10086.PC.354    -0.236257007269 -0.0459545249021    -0.0237578365365    -0.0610319731643    -0.108196674798 -0.0310525042603    -0.135994010622 -0.0531409034407
10086.PC.636    0.235473515107  0.064112581625  0.0410176765454 0.111527340997  0.049516907118  -0.0730882807377    -0.0171283942826    -0.125123242266
10086.PC.635    0.174611568528  0.0884703197515 0.0485002013096 -0.10892400543  -0.167934642131 -0.00032900268853   0.0933181097968 -0.00096487610802
10086.PC.607    0.129758486105  -0.192421718914 0.124850312733  -0.130174916534 0.116642635528  0.0506123116014 -0.0241347863587    0.00737357308632
10086.PC.634    0.24714650547   0.0440405308059 -0.0939604228432    0.0574488914739 -0.0119402550661    -0.00324056579811   -0.0862856700146    0.127110144799

Biplot  0   0

Site constraints    0   0

unweighted unifrac pcoa:

Eigvals 9
0.527576831536  0.380955902485  0.302541529797  0.266409397663  0.255696353121  0.222923562728  0.202538724053  0.191550929645  0.0

Proportion explained    9
0.2244823211    0.162095566209  0.12873049152   0.113356380295  0.108798012753  0.0948532911187 0.0861796048848 0.081504332119  0.0

Species 0   0

Site    9   9
10086.PC.481    -0.0705337512946    -0.0523255833501    0.430931391656  -0.116746630305 0.140169551454  -0.0121985776207    -0.144852652533 0.0708443496647 -0.0
10086.PC.593    -0.15252345786  -0.367119659406 -0.0488310385909    0.286852415403  -0.122137624507 0.0234169453317 -0.141457741587 -0.0529995496517    -0.0
10086.PC.356    -0.310268662031 0.202957439323  -0.161995239726 -0.0244935105553    0.15035822631   -0.249882413934 -0.0749673365754    -0.147145848302 -0.0
10086.PC.355    -0.181535837645 0.280265981912  -0.0544393413007    0.0939578745749 0.0589143460999 0.352680352153  0.0046686377854 0.0329116944137 -0.0
10086.PC.354    -0.306665965365 -0.0701963718488    0.0407228466959 -0.1114824179   -0.201732274725 -0.0620072371812    0.284898055936  0.114474970493  -0.0
10086.PC.636    0.305648978048  0.0979331334515 -0.0703076036076    0.203718428091  0.0923240786186 -0.145946437069 0.0358828025594 0.269537748495  -0.0
10086.PC.635    0.226649045655  0.135140021056  -0.0831332541419    -0.198963114956 -0.313113479926 -0.000656969485293  -0.195494992339 0.00207851498269    -0.0
10086.PC.607    0.168428915044  -0.293927672225 -0.214003498911 -0.237780522122 0.217479735297  0.101065266217  0.0505607098618 -0.0158839896734    -0.0
10086.PC.634    0.320800735448  0.0672727110869 0.161055737927  0.10493747777   -0.022262558621 -0.00647092841082   0.180762516892  -0.273817890423 -0.0

Biplot  0   0

Site constraints    0   0
HannesHolste commented 8 years ago

Thanks for that link, docs looking better now @antgonza. I added the output of fsvd to the PR description above -- please take a glance. The dimensionality is correct but it's a little surprising that it differs from the reference pcoa, though I realize it's an approximation. Either way, it shouldn't impact benchmarking/Big O analysis though. Any thoughts on the accuracy?

OK to merge?

antgonza commented 8 years ago

This looks really good, thanks. Now, I'm not sure why the differences but it's interesting that the pattern holds. I guess we can merge and figure out what's going on during large scale benchmark.