This PR adds indexing of GitHub code files, taking inspiration from https://github.com/danswer-ai/danswer/pull/1586. One notable difference from the GitLab PR is that we only index GitHub code files on initial load or complete re-indexing because there's currently no obvious way to define the "last updated on GitHub" date for a repository file. (We cannot use the last commit date because there can be an arbitrary delay between a commit being made and it actually being pushed to GitHub.)
This functionality is off by default and requires the GITHUB_CONNECTOR_INCLUDE_CODE_FILES env var to be set to true.
This PR adds indexing of GitHub code files, taking inspiration from https://github.com/danswer-ai/danswer/pull/1586. One notable difference from the GitLab PR is that we only index GitHub code files on initial load or complete re-indexing because there's currently no obvious way to define the "last updated on GitHub" date for a repository file. (We cannot use the last commit date because there can be an arbitrary delay between a commit being made and it actually being pushed to GitHub.)
This functionality is off by default and requires the
GITHUB_CONNECTOR_INCLUDE_CODE_FILES
env var to be set totrue
.