context-labs / autodoc

Experimental toolkit for auto-generating codebase documentation using LLMs
MIT License
1.93k stars 113 forks source link

Incremental re-indexing #7

Open samheutmaker opened 1 year ago

samheutmaker commented 1 year ago

Autodoc should support only indexing files and folders that have changed since the last index. At high-level, I think it looks something like this:

  1. Track the git sha at time of index.
  2. When indexing, compare files at last sha to current repository state.
  3. Calculate which branches have changes.
  4. Re-index changes branches.

If you're interested on this, please reach out.

slavakurilyak commented 1 year ago

Great progress!

andrewhong5297 commented 1 year ago

this should be close now @samheutmaker

diegofornalha commented 1 year ago

I'm reading the README and asked GPT-4 to help me with improvements, and it returned these adjustments:

Optimize change detection: In addition to using the "git sha," you can explore other ways to track changes in files and folders to make the change detection process more efficient.

Improve the granularity of reindexing: Instead of reindexing all branches with changes, you can identify and reindex only the specific files that have been altered.

Cache storage and reuse of indexing information: To reduce the time and resources required for reindexing, you can cache previous indexing information and reuse it when appropriate.

Integrate with CI/CD systems: Selective indexing can be integrated into CI/CD pipelines so that reindexing occurs automatically whenever there is a change in the source code.

I plan to study a bit more to contribute in a more assertive way.

dahifi commented 1 year ago

Regarding CI/CD, I've been using a gpt-cli tool to pipe git diff output and get summaries, and am hoping to build it into a pre-commit hook or CI/CD job as part of the PR process. Using the SHA is a good way to detect changes, we might be able to save token count by doing a diff against the last known commit hash, or a full reindex if it's too much context.

andrewhong5297 commented 1 year ago

(Just noting this has already been implemented and the issue should be closed)