elbamos / largeVis

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R
340 stars 63 forks source link

Hyperparameters' definition domain #19

Closed Laurae2 closed 7 years ago

Laurae2 commented 7 years ago

Hello,

I've looked thoroughly the paper discussing about largeVis since it is available in arxiv (since more than 5 months now), and I am still wondering about the following:

I am assuming currently that, from a matrix MAT before transposition:

And if using Windows, k*nrow(MAT) < ~4 billion (2^32) else error during projectKNNs (larger than arma sparse matrix max capacity) or even before.

Are my assumptions correct or did I miss something?

elbamos commented 7 years ago

Yes, that's correct.

But -- and this is discussed in the benchmarks vignette -- n_trees, tree_threshold, and max_iter are really three different ways of attacking the same problem, of finding the best approximation of nearest neighbors in the least time with the least RAM. So when you adjust those hyperparameters, usually what you are doing is trading one off against others.

One of the things I like about largeVis is that its a pragmatic algorithm. Being pragmatic, max_iters can be safely set at < 3 (usually, I leave it at 1). The choice of n_trees and tree_threshold is a choice of how long you want to wait before you get your result. And the balance between n_trees and tree_threshold is based on how much RAM you have.

Does this help?

elbamos commented 7 years ago

@Laurae2 If I've answered your question, I'd appreciate if you would close this issue. Thanks!

Laurae2 commented 7 years ago

Yep you answered my question!

Closing this issue.