[DISCUSSION] RAG framework

InAnYan commented 1 month ago

In my PR of JabRef we implemented RAG manually:

It's local-first
No need for special setup
No need for external application
It is fully implemented RAG architecture (though, may be not that complex as it may be)
It's free to extend

Initially, I didn't thought that there are special and separate RAG frameworks, and I thought langchain provides everything what we need (and it's true).

What I've googled through RAG frameworks:

Some of them are too raw and in early development (https://github.com/AI-Commandos/RAGMeUp)
Langchain has already written everything that library presents (https://github.com/RAG4J/rag4j)
They are standalone applications that require server (https://github.com/truefoundry/cognita ; https://github.com/microsoft/kernel-memory)
They are hard to extend. Well, it's possible to extend them using code, but then we need to maintain a separate application in JabRef, etc.

InAnYan commented 1 month ago

Comments on Kernel Memory and/or Semantic Kernel:

I didn't liked that it runs as a standalone app. (And I remember I found the java API crappy)
It's poorly documented (most of the times it's undocumented), though they have active discord community
It does everything what we did: 1) storing documents with metadata (names, citation key, library) (MVStore :heavy_check_mark: ); 2) generates embeddings (langchain4j :heavy_check_mark: ); 3) converts file from different formats to single text (we support only PDF, but I think this is solved easily, so we already addressed this :heavy_check_mark: ); 4) connects to LLM (langchain4j :heavy_check_mark: ); 5) retrieves relevant information from vector storage (langchain4j + MVStore :heavy_check_mark: )

However, they still present interesting features like: functions and planning

InAnYan commented 1 month ago

About LLama index: It's nearly the same langchain. Langchain provides all the tools that LlamaIndex provides (maybe LlamaIndex has better support for RAG)

So, in conclusion about RAG frameworks:

They are either standalone applications
They are just a bunch of various useful tools: LLM connectors, file readers, Vector storage connectors, etc. That is the same purpose of langchain

koppor commented 1 month ago

I didn't liked that it runs as a standalone app. (And I remember I found the java API crappy)

When thinking in micro services (https://12factor.net/ for a short introduction of a variant of it), it is good, that there is no monolith. Think of a research group of 10 researchers sharing their library - and working in an open and collaborative way. Then, it makes sence to run a server. Semantic Kernel "just" needs a docker command to be run. -- This is simlar than our GROBID service... (which still misses a how-to https://github.com/JabRef/user-documentation/issues/495)

ThiloteE commented 1 month ago

While the current implementation with langchain4j works, the limited amount of embedding models available and slow inference (CPU only and ONNX framework by Microsoft) leave doubts, if in future there should be some changes to the existing implementation. If so, we want frameworks to fulfill as many key criterias as possible:

Speedy inference (t/s) --> GPU offloading for 10x increases in speed
Natively Supported by major (GPU) hardware manufacturers
Supports compressed / small models --> because current LLMs by default are huge and are slow on consumer hardware. Smaller models are faster than larger models.
Low maintenance required on part of JabRef. For example, JabRef maintainers should not have to "pick a published model, then convert and make it compatible with the framework, thereby compatible with JabRef", but rather be able to simply choose a model (e.g. because there are already a lot of people out there doing conversions), then download and use it.
Natively Supported by MANY companies and organisations that share LLMs / embedding models
Natively Supported by at least one major company or organisation that shares LLMs / embedding models.
Ideally supports Java or at the bare minimum supports setting up a local API server that we can query
Ideally lightweight (JabRef does not increase in size)
Secure (trustable, because ... supported by large corporations? secure model format? dependencies containerized or in docker image etc.?)
Supports Open Source.

I am still of the opinion that we should make do with llama.cpp.

InAnYan / jabref

[DISCUSSION] RAG framework #86