Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.
https://cinnamon.github.io/kotaemon/
Apache License 2.0
17.7k stars 1.38k forks source link

[REQUEST] - Optimal configurations for indexing and querying a 5000 files source code #319

Closed huytl-remi closed 1 month ago

huytl-remi commented 2 months ago

Reference Issues

No response

Summary

Xin chao! My company wants to be able to index and query our entire source code (mostly made up of Ruby). Thus, I would like to request your insights regarding this matter. Thanks!

P/S: We're using the Qwen2.5 models (7b-coder and 32b-instruct)

Basic Example

[Basci Example]

Drawbacks

[Drawbacks]

Additional information

No response

taprosoft commented 2 months ago

Hi @huytl-remi, sorry about the late reply. I believe your use-case may be can be addressed with something likes this: https://wiki.mutable.ai/ (if you like SaaS). Otherwise, in your case we can apply additional file loaders like these:

https://github.com/yamadashy/repopack https://github.com/artkulak/repo2file

Then use Kotaemon RAG to address your queries. I belive code-based RAG also need some additional tuning especially the indexing process. Let us know if you have more ideas which you want to implement in Kotaemon.

huytl-remi commented 1 month ago

@taprosoft Hi there. Excellent suggestions. We're all new to RAG, could you recommend the ideal settings, like should we use RAG or GraphRAG, and which reasoning setting is the most suitable. Thanks!

taprosoft commented 1 month ago

@huytl-remi unfortunately we don't have much insight to provide suggestion for your use-case. Probably combination of hybrid RAG and a "repo-map" dynamic lookup can be helpful. You can check this out for reference https://aider.chat/docs/faq.html#how-can-i-add-all-the-files-to-the-chat

Since this is a new use-case we haven't explore in detail before, feel free to share your experience and customization so others can learn from it. Thanks.