Question about retrieval time

RUC-NLPIR / FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research

https://arxiv.org/abs/2405.13576

MIT License

891 stars 69 forks source link

Question about retrieval time #4

Closed BUAADreamer closed 1 month ago

BUAADreamer commented 1 month ago

Thanks for such a great job!! I noticed that your default implementation is faiss-cpu, I want to know the specific retrieval speed during inference on Wikipedia 2018 (like How many seconds per sample) to better prepare for my following experiments. Thanks very much!!

ignorejjj commented 1 month ago

@BUAADreamer Thanks your attention to our project! I test the retrieval speed using e5 as the retriever, and set the batch size to 256. This batch size may not be optimal, it is only used as a reference.

On a single A40, the average time cost for one batch (256 sample) is 6.68s, where 6.63s cost in convert text to embedding, and 0.05s cost for faiss search.

Hope this can help you.

BUAADreamer commented 1 month ago

Thanks for so quick reply! So do you recommend using faiss-gpu? Is there any notice for using faiss-gpu with FlashRAG rather than your default faiss-cpu (as in requirements.txt). Is there any special config or command?

ignorejjj commented 1 month ago

Sorry for not providing a detailed explanation in the documents.

Currently, our retriever implementation supports the use of gpu ( just set faiss_gpu to True in yaml config ).

However, our current implementation is to place the entire index into a single GPU, which may occupies a very large amount of GPU memory (approximately 70GB for wikis). So we temporarily do not recommend using GPU because the CPU looks fast enough.

We plan to support index sharding on GPUs in few weeks, which can reduce memory consumption on each GPU. This feature is already supported by faiss, but may need some time for testing, maybe several weeks.

BUAADreamer commented 1 month ago

Thanks for your quick reply! Looking forward to your update!

BUAADreamer commented 1 month ago

Here I provide a case, I use CPU and got 1.42 sample/s for the nq test, 30x slower than CPU as mentioned by @ignorejjj But it is quick enough for many tasks.

BUAADreamer commented 1 month ago

But I also found a weird problem. My retrieval seems to execute twice when running replug. Is this your specific design or do I might run something wrong?

ignorejjj commented 1 month ago

Maybe i didn't make it clear, the results above were run using faiss-cpu.

Based on my experience, your slow speed may caused by installation issues with faiss. You can try uninstalling the existing faiss and installing it using the following conda command:

conda install -c pytorch faiss-cpu=1.8.0

And there seemed to be no exceptions with replug in my machine and only one retrieval was performed. Can you show me your console output or other information to further determine if there is a problem?

BUAADreamer commented 1 month ago

This is what I got

BUAADreamer commented 1 month ago

Here I provide a updated case:

If I use pip to install, I use CPU and got 1.42 sample/s for the nq test, 30x slower than the CPU speed mentioned by @ignorejjj
I changed to conda install -c pytorch faiss-cpu=1.8.0 mentioned by @ignorejjj and the speed is 2x slower than the reported CPU speed, Considering the fluctuations in the system, this should be within normal limits

ignorejjj commented 1 month ago

We conducted a fix for repeated calls(dfa81abc2701adb31722f13ca38214f377bfa4c6) and optimized the use of the faiss gpu(b61d07aa5730b92de7aacc971776f4f71824dc83).