dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
https://dusty-nv.github.io/NanoLLM/
MIT License
193 stars 29 forks source link

[HELP] How to implementa document based RAG with VectorDB using NanoLLM? #31

Open jais001 opened 2 months ago

jais001 commented 2 months ago

Hey,

I'm new to NanoLLM, so please forgive me if my question seems basic. I couldn't find any discussion group for NanoLLM so I am posting my question here. In the NanoLLM documentation, I have seen that it supports RAG. But i could only see multimodal RAG for chatting from images and videos. I'm interested in using NanoLLM for context-aware document chat using VectorDB. Could anyone point me to the relevant documentation or resources for this?

Thank you!

dusty-nv commented 2 months ago

Hi @jais001, you are correct that the NanoDB I created/integrated with NanoLLM is mostly geared for low-latency multimodal retrieval/RAG, and although I have been meaning to find the time to go back and switch out the CLIP/SigLIP model it uses for embedding cosine simllarity to text-based embedding models (like from SentenceTransformers), today you can integrate your vectorDB/RAG solution of choice. NanoLLM exposes a Huggingface-like API for the LLM/VLA models, so you can connect it with other code and applications that you want.

dusty-nv commented 2 months ago

p.s. here are some other text RAG examples from Jetson AI Lab that use llama-index:

There is also langchain supported in jetson-containers should you want to use langchain.