georgeamccarthy / protein_search

The neural search engine for proteins.
GNU Affero General Public License v3.0
15 stars 6 forks source link

Integrate the Annoy indexer #58

Open fissoreg opened 3 years ago

fissoreg commented 3 years ago

Currently, our "database" of protein sequences is just a simple JSON file saved to disk. Search requests are done naively as a linear search over the JSON array of protein sequences.

Annoy is a simple library for storing and searching high-dimensional vectors (i.e. of the kind of our proteins embeddings) in a performant way.

We can integrate Annoy in our backend to improve the search performance and scale to more and more proteins sequences.

fissoreg commented 3 years ago

This link might be useful (not sure if it's up-to-date with Jina 2.0): https://github.com/jina-ai/executors/tree/main/jinahub/indexers/searcher/AnnoySearcher