Open ixxie opened 5 years ago
I found a library that might be helpful if this would ever be implemented:
Another library that may help (still holding out hope for this!):
https://github.com/spotify/annoy
p.s. it seems like there are no databases supporting vector search at all atm.... this could lure a lot of people to ArangoDB if its added ;)
@ixxie An alternative to annoy and faiss is nmslib, which powers the KNN module in OpenDistro for Elasticsearch.
+1 For this feature request.
Another +1
+1 for this, Neo4J supports it, ElasticSearch does the same for its documents
Another +1
Another +1
+1
@joerg84 This is all too relevant now! Mongo just annoucement vector search support, Redis has it, Postgres has an extension for it. We need this for ArangoDB especially since you view AI and ML has an important use case, now that LLMs are here, we need vector/embeddings support!
Imagine the amount of traction you will have if added as a supported vector store to a tool like Langchain or Auto-GPT, that beats all marketing campaigns hands down!
@ixxie Search is also doable with SVMs, see https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.ipynb
+1
Is this considered/mentioned in a roadmap for a coming release or has it not been acknowledged?
@litsegaard Given the above history, it's been 5 years and no sign, so it seems unlikely, however surrealdb has added support in beta. https://surrealdb.com/features
One of the best known use-cases for graph databases is for Natural Language Processing. For example in one project I am involved in, we store text in ArangoDB nodes, linking it with edges into nodes containing the parsed text, in turn linked to normalized constituent tokens. This allows us to make queries to find similar chunks of text.
A cornerstone of contemporary NLP is a class of algorithms known as word embedding models. These are neural networks which train on text and project it into a high dimensional vector space, normally with hundreds or thousands of dimensions. This space preserves the semantics, so nearer vectors are closer in meaning, and parallel vectors are parallel in meaning.
ArangoSearch brings features to help non-specialists search text. Developing features for specialized NLP use-cases with ArangoDB could be a complementary strategy. In particular, support for basic KNN queries on high-dimensional vectors could be a spectacular place to start. It would mean that I could save my word embeddings and query them in ArangoDB, making it easy for me to deploy powerful custom NLP search capabilities in AQL.
Some basic vector operations would also be helpful