cmooredev / RepoReader

Explore and ask questions about a GitHub code repository using OpenAI's GPT.
158 stars 77 forks source link

Prompt issue? #1

Open Mihaitafox11 opened 1 year ago

Mihaitafox11 commented 1 year ago

Hey, I've feed the script a documentation but whenever I try to ask it something from it, I get this error that is using too much tokens.

My questions are simple though, "what is this repo"?

An error occurred: This model's maximum context length is 4097 tokens, however you requested 24116 tokens (23860 in your prompt; 256 for the completion). Please reduce your prompt; or completion length.

cmooredev commented 1 year ago

This is probably due to the amount of documents being passed in as context. Was it a large repo that you cloned? I will have to work on a solution to this.

Mihaitafox11 commented 1 year ago

Yeah, was a pretty large repo, I appreciate the reply.

Mihaitafox11 commented 1 year ago

But I also used this https://github.com/Mihaitafox11/test , scraped the documentation of a blockchain (docs.sui.io) and tried asking questions and got the same error.

cmooredev commented 1 year ago

Looks like the test repo you posted has some large text files.

In file_processing.py line 50, I split the text into 3000 char chunks. So large text files will be split into many documents, even if the repo itself only has a few documents. To reduce tokens we could increase the chunk size. I'll have to test this out, because the larger chunk size would give us less precise summaries for documents.

I was initially just intending this to work with code, but realize it would be useful for documentation as well.

Mihaitafox11 commented 1 year ago

Thanks for the explanation, I was under the impression that if we have lower chunks of 3000,2000,1000 it will require less tokens but it seams it's the other way around :D, newbie here.

But yeah, my idea was to combine documentation that includes code and somehow feed that info to ChatGPT so it can code by itself based on that documentation. But I don't even know if I scraped the documentation in a way for him to even understand the code in it.