arangodb / arangodb

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.
https://www.arangodb.com
Other
13.43k stars 832 forks source link

Feature Request: Vector Query Capabilities #8258

Open ixxie opened 5 years ago

ixxie commented 5 years ago

One of the best known use-cases for graph databases is for Natural Language Processing. For example in one project I am involved in, we store text in ArangoDB nodes, linking it with edges into nodes containing the parsed text, in turn linked to normalized constituent tokens. This allows us to make queries to find similar chunks of text.

A cornerstone of contemporary NLP is a class of algorithms known as word embedding models. These are neural networks which train on text and project it into a high dimensional vector space, normally with hundreds or thousands of dimensions. This space preserves the semantics, so nearer vectors are closer in meaning, and parallel vectors are parallel in meaning.

ArangoSearch brings features to help non-specialists search text. Developing features for specialized NLP use-cases with ArangoDB could be a complementary strategy. In particular, support for basic KNN queries on high-dimensional vectors could be a spectacular place to start. It would mean that I could save my word embeddings and query them in ArangoDB, making it easy for me to deploy powerful custom NLP search capabilities in AQL.

Some basic vector operations would also be helpful

ixxie commented 5 years ago

I found a library that might be helpful if this would ever be implemented:

https://github.com/facebookresearch/faiss

ixxie commented 4 years ago

Another library that may help (still holding out hope for this!):

https://github.com/spotify/annoy

p.s. it seems like there are no databases supporting vector search at all atm.... this could lure a lot of people to ArangoDB if its added ;)

cris-almodovar commented 3 years ago

@ixxie An alternative to annoy and faiss is nmslib, which powers the KNN module in OpenDistro for Elasticsearch.

jmferrete commented 3 years ago

+1 For this feature request.

KeepingItClassy commented 3 years ago

Another +1

coreation commented 3 years ago

+1 for this, Neo4J supports it, ElasticSearch does the same for its documents

koji98 commented 2 years ago

Another +1

AlexMRuch commented 1 year ago

Another +1

pavelnemirovsky commented 1 year ago

+1

mysticaltech commented 1 year ago

@joerg84 This is all too relevant now! Mongo just annoucement vector search support, Redis has it, Postgres has an extension for it. We need this for ArangoDB especially since you view AI and ML has an important use case, now that LLMs are here, we need vector/embeddings support!

Imagine the amount of traction you will have if added as a supported vector store to a tool like Langchain or Auto-GPT, that beats all marketing campaigns hands down!

mysticaltech commented 1 year ago

@ixxie Search is also doable with SVMs, see https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb

ArtyomZemlyak commented 9 months ago

+1

litsegaard commented 5 months ago

Is this considered/mentioned in a roadmap for a coming release or has it not been acknowledged?

mysticaltech commented 5 months ago

@litsegaard Given the above history, it's been 5 years and no sign, so it seems unlikely, however surrealdb has added support in beta. https://surrealdb.com/features