dataplayer12 / Fly-LSH

An implementation of efficient LSH inspired by fruit fly brain
MIT License
87 stars 27 forks source link

License? #1

Closed pombredanne closed 6 years ago

pombredanne commented 6 years ago

Hi! What would be the license for you code? I stumbled on your excellent article at https://medium.com/@jaiyamsharma/efficient-nearest-neighbors-inspired-by-the-fruit-fly-brain-6ef8fed416ee Thanks!

dataplayer12 commented 6 years ago

Thanks for reading the article! I have just added the license. You might want to update your fork.

pombredanne commented 6 years ago

Thank you: you are prince! that was fast.... It may well be the fastest closed issue of all times!

pombredanne commented 6 years ago

BTW as I am reading this article (I am done yet, just started a few minutes ago) I was reading before this https://falconn-lib.org/ .... which is an improvement over the traditional LSH random projections by using data-dependent projections/hashing. Just FYI, it looks mighty interesting, especially if we could combine it all

dataplayer12 commented 6 years ago

Yes, that is a great idea. In fact, if you read the articles, I have already found a way to improve the performance of this algorithm by performing projections on the directions learnt by an autoencoder. The code for implementing this is also included in this repo. Please have a look at notebook.ipynb for an example of how to use it.

pombredanne commented 6 years ago

Awesome. Now this approach requires to train something ahead of time right? so you need the a good labelled data set in the first place? I am asking because I am building a kind of approximate massive diff engine to find similar source code... and I cannot train things ahead of time (hence why Charikar random projections are so appealing)

dataplayer12 commented 6 years ago

Well, actually autoencoders do not require a labelled dataset. They are an unsupervised learning method in which the network tries to encode the inputs in a compressed way and reconstructs the inputs themselves from the encodings. No labelled dataset is needed. I am not familiar with Charikar random projections. I will read about them. Thanks for letting me know.

pombredanne commented 6 years ago

Thanks!