If the codebase is too large, retrieve a subset of the codebase

lpietrobon commented 1 year ago

If we incorporate all the codebase in the prompt sent to the LLM, we might overflow the max length allowed by the LLM. So if the codebase is large, maybe we want to add a retrieval step where we identify the most relevant pieces of code given the user instructions and only add these relevant ones to the LLM input

alexut commented 1 year ago

I would add to that and propose to create something like .filename_extension for large files, where we would input a short description of the file or what dose it do. and takes precedence over reading the actual file. Also maybe we can create a .ignore file with things that are not worth reading by the LLM.

yoDon commented 1 year ago

I'm not sure whether to include this here or split it out into another issue... large numbers of files in the tree also cause problems separate from whether the LLM can hold their contents:

tl;dr:

OSError: [Errno 7] Argument list too long: 'git'

Full call stack:

Traceback (most recent call last):
  File ".../envs/Mentat/bin/mentat", line 8, in <module>
    sys.exit(run_cli())
             ^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/app.py", line 24, in run_cli
    run(paths)
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/app.py", line 35, in run
    loop(paths, cost_tracker)
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/app.py", line 46, in loop
    code_file_manager = CodeFileManager(paths, user_input_manager, config)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/code_file_manager.py", line 181, in __init__
    self._set_file_paths(paths)
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/code_file_manager.py", line 221, in _set_file_paths
    repo.ignored(*all_files)
  File ".../envs/Mentat/lib/python3.11/site-packages/git/repo/base.py", line 877, in ignored
    proc: str = self.git.check_ignore(*paths)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/git/cmd.py", line 741, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/git/cmd.py", line 1315, in _call_process
    return self.execute(call, **exec_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/git/cmd.py", line 985, in execute
    proc = Popen(
           ^^^^^^
  File ".../envs/Don-Mentat/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File ".../envs/Don-Mentat/lib/python3.11/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: 'git'

yoDon commented 1 year ago

Good news: If you point mentat on launch at a folder within a larger project as a way of reducing the token and file count, it's able to understand the set of files in the folder and propose diff's to implement your requests.

Bad news: After proposing the changes, mentat is not able to apply the diffs, possibly because mentat thinks the root of the git project is the folder it was launched in but git treats the ancestor containing the .git folder as the project root?

% mentat path/to/folder/deep/inside/repo

>>> make some changes

Apply these changes? 'Y/n/i' or provide feedback.
Y

Traceback (most recent call last):
  File ".../envs/Mentat/bin/mentat", line 8, in <module>
    sys.exit(run_cli())
             ^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/app.py", line 24, in run_cli
    run(paths)
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/app.py", line 35, in run
    loop(paths, cost_tracker)
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/app.py", line 61, in loop
    need_user_request = get_user_feedback_on_changes(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/app.py", line 144, in get_user_feedback_on_changes
    code_file_manager.write_changes_to_files(code_changes_to_apply)
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/code_file_manager.py", line 353, in write_changes_to_files
    new_code_lines = self._get_new_code_lines(changes)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/code_file_manager.py", line 312, in _get_new_code_lines
    if new_code_lines != self._read_file(rel_path):
                         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../envs/Mentat/lib/python3.11/site-packages/mentat/code_file_manager.py", line 249, in _read_file
    with open(abs_path, "r") as f:
         ^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'local/path/from/root/of/repo/to/subdir/modified-file.foo'

Workaround: Manually applying the diffs in a text editor seems to work for now. Not ideal, but I'm still super excited to be using mentat.

biobootloader commented 1 year ago

@Luke-in-the-sky , @alexut those are both great ideas. We are thinking about the best ways to work with bigger codebases - make sure you join the discord where most of that discussion will take place, as this is a big feature.

biobootloader commented 1 year ago

OSError: [Errno 7] Argument list too long: 'git'

@yoDon ah yes, we'll fix this bug asap

biobootloader commented 1 year ago

After proposing the changes, mentat is not able to apply the diffs, possibly because mentat thinks the root of the git project is the folder it was launched in but git treats the ancestor containing the .git folder as the project root?

@yoDon ah, this is probably another bug. this is what your PR address I guess?

yoDon commented 1 year ago

Hi @biobootloader, https://github.com/biobootloader/mentat/pull/8 should help with the large project issues @Luke-in-the-sky and others were having, and https://github.com/biobootloader/mentat/pull/7 fixed the pathing issue I was seeing.

AbanteAI / mentat

If the codebase is too large, retrieve a subset of the codebase #3