Open kaislar opened 5 months ago
Hello @kaislar Hi, I want to understand your use case for indexing the codebase. 🤔
Thanks
Hello, . You can ask questions about your code base . Explain some parts of the code... It's like microsoft github copilot
I second the request for a GitHub connector that indexes the source code. This is incredibly useful, especially if the code has good doc strings. It is something that the GitHub document loaders for langchain and llama-index already do and is the connector that I value the most in those frameworks. We have a number of repositories consisting of code examples that produce fantastic responses when queried with a RAG populated by these document loaders.
I haven't played enough with danswer yet, but if it exports an API, it could also be connected to aVS Code plugin (continue) to enable completely local (airgapped) co-pilot like capabilities in the IDE.
I also like this idea! @dmikulin-dwave does simple RAG works well for you or do you do something a little bit fancier? We also use it for code repositories but it seems like our embedding model sometimes struggles to find the relevant parts for a given question. I thought that standard syntax parsing and retrieving context using the identifiers (class, function names, etc) might work better. Has anyone tried something similar?
@mattifrind: Our open source code and documentation are already easily accessible and easily indexed by one's IDE of choice. However, quantum computing is not an easy new area to pick up; it's a very different paradigm for programming and just knowing the classes and functions, even if you understand what they do well, isn't going to get you very far.
I've created a POC using langchain, document loaders for source code and source code documentation, ChromaDB, and Ollama directly and the preliminary results have been quite good. I'm optimistic that there are a number of additional techniques that can be layered on top of simple vector retrieval, or used in addition to vector retrieval, to provide better context to the LLM to get even better responses.
Hello,
Is there any reason why the feature to load codebase was not implemented for the github connector, only issues and prs are loaded.
I worked on a extention of the connector to fetch code files on a given branch using github api by looping over all the commits. Is this something that could interest you ?
If yes i can submit a PR and get your feedbacks.