Why is largeVis found to be slower than Rtsne?

claczny commented 8 years ago

Hi,

first of all, thank you very much for this implementation of largeVis. I gave it a try and it worked fine. I had to follow the instructions at http://thecoatlessprofessor.com/programming/rcpp-rcpparmadillo-and-os-x-mavericks-lgfortran-and-lquadmath-error/ for Mac OS X El Capitan to install it first, though.

The dataset that I tested is comprised of ~15k points with 512D. Running it with default parameters and a single thread took about 1200 seconds while running Rtsne() took about 212 seconds.

The clusters looked much tighter and mostly better separated then those from Rtsne().

The longer runtime came a bit unexpected after I read

It has been benchmarked at more than 30x faster than Barnes-Hut on datasets of approximately 1-million rows, and scaled linearly as long as there is sufficient RAM.

in your README.

Hence, I am curious where this difference comes from and was wondering if you could maybe provide some clarifications here.

TIA.

Best,

Cedric

elbamos commented 8 years ago

The core of the speed difference is scalability. Rtsne scales in O(n log n), or a little worse. Largevis scales in O(n). The 30x multiplier was on a particular 1,000,000 point, 1024 feature dataset. The actual difference will depend on data size.

I would expect largevis to be considerably faster than rtsne on a 15k point dataset but I haven't timed it. It's definitely a lot faster when you get up to 60k, mnist size. A lot depends on hyperparameters. Can you provide the parameters you used for both rtsne and largevis, and possibly the dataset?

Also, is there a reason you are only using one thread? Part of the benefit of largevis is that it parallelizes well. That's a key benefit of the algorithm.

On Sep 14, 2016, at 4:40 AM, Cedric Laczny notifications@github.com wrote:

Hi,

first of all, thank you very much for this implementation of largeVis. I gave it a try and it worked fine. I had to follow the instructions at http://thecoatlessprofessor.com/programming/rcpp-rcpparmadillo-and-os-x-mavericks-lgfortran-and-lquadmath-error/ for Mac OS X El Capitan to install it first, though.

The dataset that I tested is comprised of ~15k points with 512D. Running it with default parameters and a single thread took about 1200 seconds while running Rtsne() took about 212 seconds.

The clusters looked much tighter and mostly better separated then those from Rtsne().

The longer runtime came a bit unexpected after I read

It has been benchmarked at more than 30x faster than Barnes-Hut on datasets of approximately 1-million rows, and scaled linearly as long as there is sufficient RAM.

in your README.

Hence, I am curious where this difference comes from and was wondering if you could maybe provide some clarifications here.

TIA.

Best,

Cedric

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

claczny commented 8 years ago

Thank you for the quick reply.

Can you provide the parameters you used for both rtsne and largevis, and possibly the dataset?

I used the default parameters for both approaches. This time, I attached a plot showing the different results: There are four of them as I additionally compared the possible effect an initial dimension reduction to 50D via PCA/SVD would have. Runtimes are also indicated in the plots.

Also, is there a reason you are only using one thread? Part of the benefit of largevis is that it parallelizes well. That's a key benefit of the algorithm.

The reason is very simple: I used it on a MacBook Pro w/ OS X El Capitan. It is great that largeVis() parallelizes well and I am looking forward to it, yet, requiring multiple threads just to match the speed of Rtsne() did not appear particularly appealing to me based on my first tests. Then again, if I missed something and if there is something to be changed to have it be faster than Rtsne(), additionally coupling this with the speedup from the parallelization would be even better of course.

You can find the full dataset (w/o extra PCA/SVD; 512D) and the reduced dataset (w/ extra PCA/SVD; 50D) at https://drive.google.com/open?id=0BzzLX1D_xyOkSlJJZGFqZUU1SzQ

largevis_vs_bhsne.pdf

elbamos commented 8 years ago

Thanks, I'll take a look tonight.

I run it on a Mac Pro with El Capitan all the time. There's no downside to openmp in that configuration. The code and algorithm are designed for parallelization. If you disable that, your computer is going to spend a lot of time being thread-safe without any benefit.

That said, I ran it last night on random data with your dimensionality and it was considerably slower than it should have been. It's possible this is caused by a recent change, and I'm trying to track it down. In the meantime, I suggest simply reducing the number of sgd_batches. The number recommended in the paper, which is the default package, is more than really necessary.

On Sep 15, 2016, at 5:52 AM, Cedric Laczny notifications@github.com wrote:

Thank you for the quick reply.

Can you provide the parameters you used for both rtsne and largevis, and possibly the dataset?

I used the default parameters for both approaches. This time, I attached a plot showing the different results: There are four of them as I additionally compared the possible effect an initial dimension reduction to 50D via PCA/SVD would have. Runtimes are also indicated in the plots.

Also, is there a reason you are only using one thread? Part of the benefit of largevis is that it parallelizes well. That's a key benefit of the algorithm.

The reason is very simple: I used it on a MacBook Pro w/ OS X El Capitan. It is great that largeVis() parallelizes well and I am looking forward to it, yet, requiring multiple threads just to match the speed of Rtsne() did not appear particularly appealing to me based on my first tests. Then again, if I missed something and if there is something to be changed to have it be faster than Rtsne(), additionally coupling this with the speedup from the parallelization would be even better of course.

You can find the full dataset (w/o extra PCA/SVD; 512D) and the reduced dataset (w/ extra PCA/SVD; 50D) at https://drive.google.com/open?id=0BzzLX1D_xyOkSlJJZGFqZUU1SzQ

largevis_vs_bhsne.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

claczny commented 8 years ago

Update:

I tried sgd_batches=10000, in turn giving the attached result. I played around with a few other values of similar scale and while the speed is indeed greatly increased, the results are all meaningless in terms of the expected cluster structures, i.e., what BH-SNE would return or largeVis with the default parameters.

Looking forward to your comments.

Did you by any chance find the time to look into this "slow-down" any further?

largevis_vs_bhsne.sgd_batches_10000.pdf

elbamos commented 8 years ago

10,000 is way too few :p try 15000 * K and adjust from there. The recommendation in the paper is 10000 * the number of edges!

I'm working on it for the next version. Right now it seems that one of the changes leading up to the cran submission is degrading performance. That said, with small datasets I would not expect it to be faster than rtsne.

On Sep 19, 2016, at 10:05 AM, Cedric Laczny notifications@github.com wrote:

Update:

I tried sgd_batches=10000, in turn giving the attached result. I played around with a few other values of similar scale and while the speed is indeed greatly increased, the results are all meaningless in terms of the expected cluster structures, i.e., what BH-SNE would return or largeVis with the default parameters.

Looking forward to your comments.

Did you by any chance find the time to look into this "slow-down" any further?

largevis_vs_bhsne.sgd_batches_10000.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

yurymalkov commented 8 years ago

Hi @elbamos. Can you explain why largeVis has O(n) complexity? As far as I have understood, it has annoy tree construction as a part of the algorithm, and annoy(as a tree with height~log(n)) has O(n*log(n)) complexity at best. So it seems a bit weird that largeVis has overall lower complexity.

elbamos commented 8 years ago

I suggest reading the paper for that.

I had the same question when I read it. I think the answer has to do with how one evaluates the time complexity of annoy depending on what annoy parameters are held constant... but that's really beyond the scope of this issues chat :)

On Sep 20, 2016, at 4:16 PM, Yury Malkov notifications@github.com wrote:

Hi @elbamos. Can you explain why largeVis has O(n) complexity? As far as I have understood, it has annoy tree construction as a part of the algorithm, and annoy(as a tree with height~log(n)) has O(n*log(n)) complexity at best. So it seems a bit weird that largeVis has overall lower complexity.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

elbamos commented 8 years ago

@claczny I tracked (at least a large part of) the performance issue down to a few late code changes. The next version should run sgd around 20% faster. Also, I've added sgd-with-momentum, which can improve performance substantially. If you want to give it a try, I've pushed the current dev version to the /development branch on this git.

elbamos commented 8 years ago

@claczny Will you try the release candidate in branch 0.1.10 and let me know how the speed is for you?

claczny commented 8 years ago

@elbamos I'd be happy to. Could you please let me know how to install a package in R from a non-master branch?

TIA!

[UPDATE]

Nevermind, found it :) devtools::install_github("elbamos/largeVis", ref = "release/0.1.10"). Sorry, should have looked two search hits further.

claczny commented 8 years ago

@elbamos Good news, I can support your claim of having accelerated the computation by about 20%, see the attached PDF! Great work!

I wanted to check about "sgd-with-momentum" by looking at the vignette in RStudio but got the following error message:

Error in fetch(key) : lazy-load database '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/largeVis/help/largeVis.rdb' is corrupt

Not sure whether this is a problem on my side or not.

Could you maybe elaborate briefly on that feature and explain how to use it in R?

Thank you!

largevis_vs_bhsne.release_0.1.10.pdf

elbamos commented 8 years ago

That's an installation issue. Quit R, use devtools to reinstall, quit and restart R again, it should work. The feature is easy to use, just give it a momentum parameter from 0-1.0, and tell it how many batches to use in sgd, which you can also give as a proportion 0-1.0 of the default.

On Sep 30, 2016, at 4:57 AM, Cedric Laczny notifications@github.com wrote:

@elbamos Good news, I can support your claim of having accelerated the computation by about 20%, see the attached PDF! Great work!

I wanted to check about "sgd-with-momentum" by looking at the vignette in RStudio but got the following error message:

Error in fetch(key) : lazy-load database '/Library/Frameworks/R.framework/Versions/3.2/Resources/library/largeVis/help/largeVis.rdb' is corrupt

Not sure whether this is a problem on my side or not.

Could you maybe elaborate briefly on that feature and explain how to use it in R?

Thank you!

largevis_vs_bhsne.release_0.1.10.pdf

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

claczny commented 8 years ago

Thank you, will try it out. Any suggestions of parameter values to start with as a "good" compromise between speed-up and performance?

elbamos commented 8 years ago

The vignette shows the results of a grid search for those hyperparameters on mnist. My feeling now is to start with momentum 0.8 or higher and for sgd batches, you can start at 0.6 and move down.

On Sep 30, 2016, at 9:30 AM, Cedric Laczny notifications@github.com wrote:

Thank you, will try it out. Any suggestions of parameter values to start with as a "good" compromise between speed-up and performance?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

elbamos commented 8 years ago

I really encourage you to look at the vignettes. I included detailed benchmarks and examples to visualize the effects of the hyperparameters, as well as for clustering.

On Sep 30, 2016, at 9:30 AM, Cedric Laczny notifications@github.com wrote:

Thank you, will try it out. Any suggestions of parameter values to start with as a "good" compromise between speed-up and performance?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

elbamos commented 8 years ago

I'm going to close this now. Feel free to reopen if anything else comes up.

elbamos / largeVis

Why is largeVis found to be slower than Rtsne? #30