SciPhi-AI / R2R

The all-in-one solution for RAG. Build, scale, and deploy state of the art Retrieval-Augmented Generation applications
https://r2r-docs.sciphi.ai/
MIT License
3.37k stars 252 forks source link

[Questions] Is it possible to use HuggingFace? 100k PDFs?! #87

Closed MatteoRiva95 closed 6 months ago

MatteoRiva95 commented 7 months ago

Hello everyone,

I have just a few questions:

Is it possible to use R2R with HuggingFace? Moreover, does R2R work with almost 100k pdfs? Because I tested RAG with several tutorials and it took 10 minutes to reply to one question :(

Thank you so much in advance!

emrgnt-cmplxty commented 7 months ago

HuggingFace inference or HuggingFace datasets? For the latter, there is an old and now deleted demo here.

As for the slow reply, can you share your logs / environment? Happy to help you debug, we typically see response times in <5-10s.

MatteoRiva95 commented 7 months ago

Hello @emrgnt-cmplxty!

First of all, thank you so much for your fast reply. I really appreciate! Secondly, yes sorry, I should give more details.

I am currently working on an AI project where the idea is to give to a large language model thousands of english PDFs (around 100k, all about the same topic) and then to be able to chat with it.

I followed several tutorials about RAG (e.g: https://levelup.gitconnected.com/building-a-private-ai-chatbot-2c071f6715ad https://www.diariodiunanalista.it/posts/chatbot-python-langchain-rag/). They both use HuggingFace where to download the LLM. Unfortunately, when I asked something to the model (Zephyr-7b), it took almost 10 minutes to reply to one question :( Moreover, sometimes it gave a sort of "hallucination" (for example, the title of the PDF is correct, but it gives erroneous years or URLs). Too much information for the model (for testing, I am just using 500 PDFs for now)? Chunk size is not good (I am using chunk_size=1000, chunk_overlap=0)?

I also tried to add prompt template, without any luck at all.

Finally, I discovered that RAG for deployment is too difficult and expensive. I searched for a solution and I found R2R, hoping it could help me!

I am using Python 3.10 and a cluster with 1 gpu 32 gb RAM and cpu 200 gb RAM available.

Let me know if you need more info. Thank you so much for your help and time! :)

emrgnt-cmplxty commented 7 months ago

Hey Matteo,

Thanks for sharing more details! What kind of GPU are you running with? The slow response time sounds like it could be coming from your model being ran on CPU.

It would probably be easier to use OpenAI for testing if you can quickly obtain one of their API keys. They are already supported in the R2R framework. If you must use your own local model then I recommend using VLLM [https://docs.vllm.ai/en/latest/getting_started/quickstart.html]. You can host an OpenAI compatible endpoint with their service which gives a lot of nice metrics around the LLM and makes it very easy to connect to our framework. I recommend testing this system separately before diving into RAG.

Once you have an LLM provider up and running it will be easier to start tackling these other challenges - I'm happy to answer here, or you can connect to Discord to get live feedback from the me & the community.

pablospe commented 7 months ago

Would it be possible use ollama in this case? And how?

Maybe if the code would support litellm, it would be easier to use local model: https://docs.litellm.ai/docs/

MatteoRiva95 commented 7 months ago

@emrgnt-cmplxty Thank you so much for your kind reply! I am using a Tesla V100-SXM2-32GB GPU.

Yes, OpenAI could be the easiest and the most efficient solution, but I wanted to use something open-source with HuggingFace + Langchain or anything else :(

Sorry for the late response, I was convinced that I had already replied!

emrgnt-cmplxty commented 7 months ago

Would it be possible use ollama in this case? And how?

Maybe if the code would support litellm, it would be easier to use local model: https://docs.litellm.ai/docs/

Adding LiteLLM as the default provider today which should make this use case trivial, let me know how it works for you =).

pablospe commented 7 months ago

Adding LiteLLM as the default provider today which should make this use case trivial, let me know how it works for you =).

That would be great! Is there a branch for this?

emrgnt-cmplxty commented 6 months ago

It's in as deefault now, closing this issue.

pablospe commented 6 months ago

For documenting purposes, could you comment how to use it with a little example? Probably people will find this github issue but won't be able to know how to use it.