elbamos / largeVis

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R
340 stars 63 forks source link

Incorrect setting of tree_threshold in largeVis() #37

Closed yixuan closed 7 years ago

yixuan commented 7 years ago

According to the documentation, tree_threshold by default is the number of features of the input matrix. However, the code shows that it is actually set to be the number of observations.

elbamos commented 7 years ago

Its correct - remember that the package wants the input matrix to be transposed from the R standard, so that observations are columns and features are rows.

(The reason for this is memory -- the C++ libraries need data in column-ordered format to be efficient, and transposing would double the size needed for the dataset. Issues like this begin to matter when the data size gets very large, which is the point of the package.)

yixuan commented 7 years ago

Yeah, that's what I find confusing. Given the matrix layout, nrow(x) equals the number of features, and ncol(x) equals the number of observations. It seems we need nrow(x) here.

elbamos commented 7 years ago

Hey you're right, thanks! I had this correct in randomProjectionTreeSearch but not largeVis. I've changed it now, but if you want to submit a PR with the change so you get the commit, I'll accept the PR.

yixuan commented 7 years ago

This is just a quick fix, so I would close this issue and let you fix it later. 😄