biocore / mds-approximations

Multidimensional scaling algorithms for microbiology-ecology datasets.
6 stars 7 forks source link

SSVD unit testing problems #22

Closed HannesHolste closed 8 years ago

HannesHolste commented 8 years ago

using the provided dm & pcoa output files, I am unable to test SSVD.

cc @antgonza

unweighted_unifrac_dm.txt unweighted_unifrac_pc.txt

Because:

  File "/Users/hannes/bio/knightlab/mds-approximations/mdsa/algorithms/ssvd.py", line 69, in run
    eigenvalues, eigenvectors = eigsh(bbt, num_dimensions_out)
  File "/Users/hannes/miniconda2/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1507, in eigsh
    raise ValueError("k must be between 1 and the order of the "
ValueError: k must be between 1 and the order of the square input matrix.

Here, the order of the distance matrix in unweighted_unifrac_dm.txt is 9, which is the same order as the expected pcoa output in unweighted_unifrac_dm.txt (by order, does scipy mean linear algebra matrix rank ?).

Indeed, as one can see from the docs, this is a limitation of scipy.sparse.linalg.eigsh:

k : int, optional The number of eigenvalues and eigenvectors desired. k must be smaller than N. It is not possible to compute all eigenvectors of a matrix.

Therefore, eigsh fails. As far as I can see, one possible solution would be if you could provide me a unweighted_unifrac_pc.txt generated from the same unweighted_unifrac_pc.txt where the dimensionality/order/rank is 8, so I can test by doing: mdsa run --algorithm ssvd --dimensions 8 ./data/unweighted_unifrac_dm.txt

antgonza commented 8 years ago

@HannesHolste, could you add a note, just for reference, of why is fine to close this issue? Thanks.

HannesHolste commented 8 years ago

Agreed:

Though the output of unweighted_unifrac_pc.txt appeared to be 9 dimensions, closer inspection revealed that the last dimension just consisted of zeros. Hence it seemed misleading, but was actually 8 dimensions.