danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://docs.danswer.dev/
Other
10.37k stars 1.25k forks source link

Git repo connector #1602

Closed LukeCarrier closed 3 weeks ago

LukeCarrier commented 3 months ago

The existing GitHub and GitLab connectors seem to index only issues and pull requests. It would be nice for us to additionally index Git repo content.

This probably has a bunch of rough edges and needs some testing.

vercel[bot] commented 3 months ago

@LukeCarrier is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.

danielnaber commented 3 months ago

The GitLab connector at least can index code, but I had to activate it in the source (include_code_files: bool = True instead of include_code_files: bool = GITLAB_CONNECTOR_INCLUDE_CODE_FILES) to make it work. There are other bugs, see e.g. PR1767

yuhongsun96 commented 3 weeks ago

Hi @LukeCarrier, so sorry that I'm closing so many of your PRs, do you want to connect with us? Happy to give a bit of insight on the roadmap and areas to contribute if you're interested! You can always reach me at yuhong@danswer.ai

Code search can't be handled by just ingesting and chunking code files. It requires building a graph of the code base and other approaches that aren't directly keyword/vector chunk similarity. It's a completely separate flow which is why we haven't added the ability to ingest code via the GitHub Connector. It's DEFINITELY on the roadmap though