Closed pombredanne closed 6 years ago
Thanks for reading the article! I have just added the license. You might want to update your fork.
Thank you: you are prince! that was fast.... It may well be the fastest closed issue of all times!
BTW as I am reading this article (I am done yet, just started a few minutes ago) I was reading before this https://falconn-lib.org/ .... which is an improvement over the traditional LSH random projections by using data-dependent projections/hashing. Just FYI, it looks mighty interesting, especially if we could combine it all
Yes, that is a great idea. In fact, if you read the articles, I have already found a way to improve the performance of this algorithm by performing projections on the directions learnt by an autoencoder. The code for implementing this is also included in this repo. Please have a look at notebook.ipynb for an example of how to use it.
Awesome. Now this approach requires to train something ahead of time right? so you need the a good labelled data set in the first place? I am asking because I am building a kind of approximate massive diff engine to find similar source code... and I cannot train things ahead of time (hence why Charikar random projections are so appealing)
Well, actually autoencoders do not require a labelled dataset. They are an unsupervised learning method in which the network tries to encode the inputs in a compressed way and reconstructs the inputs themselves from the encodings. No labelled dataset is needed. I am not familiar with Charikar random projections. I will read about them. Thanks for letting me know.
Thanks!
Hi! What would be the license for you code? I stumbled on your excellent article at https://medium.com/@jaiyamsharma/efficient-nearest-neighbors-inspired-by-the-fruit-fly-brain-6ef8fed416ee Thanks!