AllAboutAI-YT / easy-local-rag

SuperEasy 100% Local RAG with Ollama + Email RAG
MIT License
765 stars 208 forks source link

Full mailbox in Vector DB #17

Open stevenbaert opened 5 months ago

stevenbaert commented 5 months ago

Why would you embed every search separately? Doesn't make sense to me. Since you use free embeddings and free Llama 3 you could as well vectorize your full mailbox and run a (daily) scheduled job that embeds the diff?

AllAboutAI-YT commented 5 months ago

Of course you can do that! For me its just too much unvaluable data in all my emails that contaminates the vector search

stevenbaert commented 5 months ago

An LLM should be able to help you cleanup the data/pre processing. Probably there are already Github project which do this. Also the "noise" is the same for one as for million mails, right?

The only relevant approach for me seems putting all data into the vector db, then optimizing that process and doing delta update. Specifically since with Open Source there is no cost involved in the attempt(s). Querying full mailbox potentially give much more valuable insights.

stevenbaert commented 5 months ago

Just thinking out loud: you could do a for each mail make a json object with type:mail, timestamp, subject, from, to, att, bodytoplaintext etc and feed that into vector db. That way you can ask for recent mails from person x. It will be some experimentation but of so much more value and once finetuned, only delta updates needed and way more interesting interactions. Same approach with calendar items.