RUC-NLPIR / FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research
https://arxiv.org/abs/2405.13576
MIT License
1.39k stars 114 forks source link

【New Features】Chunking method available? #3

Open Kunlun-Zhu opened 6 months ago

Kunlun-Zhu commented 6 months ago

To my best understanding.

The retriever only returns the doc ID without the chunking method for each document.

I would also suggest API usage for chatGPT, Gemini, Claude, etc in the generator.

DaoD commented 6 months ago

@ignorejjj check this

ignorejjj commented 6 months ago

The retriever will retrieve similar items (including ID and text) from the document corpus. As I understand it, document chunking is employed during corpus construction and does not need to be returned by the retriever.

For the generator, due to various limitations of the black-box model (can't return logits, requiring API costs), we did not implement it initially. To ensure completeness, we plan to implement mainstream API-based models, such as ChatGPT within the next few weeks.

If I have misunderstood anything, please feel free to make suggestions!

Kunlun-Zhu commented 6 months ago

Thanks for the reply, looking forward to new updates.

linchen111 commented 6 months ago

hope that I can use this to chunk my html ,hhh