Pythagora-io / gpt-pilot

The first real AI developer
Other
29.83k stars 2.96k forks source link

[Bug]: GPTP includes all project files when making queries to LLM #469

Open Michal-Mikolas opened 8 months ago

Michal-Mikolas commented 8 months ago

Version

Command-line (Python) version

Operating System

Windows 10

What happened?

After GPTP executes basic commands (like cd projectfolder) or when it wants to debug something, it tries to "save" every file in the project. That also includes thousands of files from the used frameworks and libraries. Printing all the files to the console takes minutes to finishes (so it's annoying).

IMHO it doesn't make any sence to "save" somewhere all these files after simple cd command.
(Anyway, where is it trying to save the files? These files are already created and saved on my HDD.)

--------- EXECUTE COMMAND ----------
Can I execute the command: `php artisan serve` with 30000ms timeout?

? If? If yes, just press ENTER. Otherwise, type "no" but it will be processed as successfully executed.

answer:
CLI OUTPUT:Could not open input file: artisan
NEEDS_DEBUGGING

Saving file C:\Users\mikolas\projects\gpt-pilot\workspace\helloworld\helloworld\.editorconfig
Saving file C:\Users\mikolas\projects\gpt-pilot\workspace\helloworld\helloworld\.env
Saving file C:\Users\mikolas\projects\gpt-pilot\workspace\helloworld\helloworld\.env.example
...(a few thousands more 'Saving file' lines)
Saving file C:\Users\mikolas\projects\gpt-pilot\workspace\helloworld\helloworld\vendor\swiftmailer\swiftmailer\doc\japanese.rst
Saving file C:\Users\mikolas\projects\gpt-pilot\workspace\helloworld\helloworld\vendor\swiftmailer\swiftmailer\doc\messages.rst
Saving file C:\Users\mikolas\projects\gpt-pilot\workspace\helloworld\helloworld\vendor\swiftmailer\swiftmailer\doc\plugins.rst
Michal-Mikolas commented 8 months ago

Example with cd projectname command:

WindowsTerminal_qxlQPbEks6

Because of this, simple cd projectname command takes several minutes to finish.

senko commented 8 months ago

This is a known bug, we currently send all the project files to LLM to get the next steps. This doesn't work well when there are a bunch of files eg. when you generate a project using some framework.

The fix is to only pick and send relevant files for the task at hand. This is something we're already working on.

Michal-Mikolas commented 8 months ago

Idea: Let GPT generate .gitignore file for the project (so all the framework files will be ignored by Git) and then respect the .gitignore file also for GPTP.

IMHO in most of the cases the files you want to hide from Git will be the same as you want to hide from GPT (framework files, config files with secret keys, cache folders etc.).

senko commented 8 months ago

I think that's a great idea (to use .gitignore to ignore certain files), but there are also a lot of files that you do want to commit to the repository that are getting created, so this only fixes a part of the problem. But it's an easy fix and we should definitely do it!

irvingzamora commented 8 months ago

I agree, ignoring the files from .gitignore is a great start. I also had the issue and I can relate to this specific case because I was creating a Laravel app. Like senko mentioned, this happens when using a framework.

I ran this prompt into chat-gpt:

When initiating a new project using [Laravel], developers often encounter numerous pre-generated files. To orient a new developer to the project efficiently, which specific files or directories are considered essential for them to review first in order to understand the key components of the application? (Provide a json with the name of most relevant files, exclude any other detail, just provide the json with the filenames)

It returned this:

{ "essential_files": [ "routes/web.php", "app/Http/Controllers", "app/Models", "resources/views", ".env", "config/app.php", "database/migrations" ] }

It is throwing ".env" so of course .gitignore will have to filter the results after getting the essential_files.

So maybe something like this could also help. Even though frameworks like Laravel already have stablished which files are important, so making a request to know the essential files makes no sense. But I am eager to learn the solution, since this is something you're already working on.