askthecode / documentation

MIT License
86 stars 14 forks source link

Unable to give answer due to file size too large #22

Open s1awwhy opened 11 months ago

s1awwhy commented 11 months ago

AskTheCode is a very useful plug-in, for which I also subscribed for additional monthly requests. However, some problems were encountered during use. When I used AskTheCode to analyze a git repo, I asked a question in the prompt, and AskTheCode was able to give the file name related to the question. However, AskTheCode cannot give me the answer I want because the file is too large. Even if I asked the branch to analyze it, the answer he gave me was that the file was too large. Is it because the size of this file exceeds the token limit set by AskTheCode? Here is the reply AskTheCode gave me:

I have successfully retrieved and analyzed the contents of the ike_sa.c file from the StrongSwan repository. However, due to the size and complexity of the file, I need to perform an additional query to fully extract the relevant information regarding .......

Would you like me to proceed with this additional analysis to provide detailed information on ......?

dsomok commented 11 months ago

Hi @s1awwhy,

Thank you for reaching out and sharing your experience with the AskTheCode plugin. Yes, I acknowledge that the plugin is currently constrained by a hard limit of 50,000 characters for file sizes it can process. I've also noticed that even files around 25,000 characters can lead to a lower quality response. For optimal performance, it's best to work with files that are around 17,000 characters or less.

This limitation is due to the underlying use of the GPT-4 model, which has a context window of 8k tokens. As a result, ChatGPT compresses the file content before analyzing it with the model, which is the main reason for the issues you've encountered.

For more insights into the challenges of handling long files, please refer to these discussions: Issue #19 and Issue #9. In these threads, I have provided a broader explanation of why these limitations occur.

I recommend you to use the AskTheCode GPT, which employs the GPT-4-Turbo model with a 128k token context window. This significantly increases the file size limit to approximately 95,000 characters, aligning with OpenAI's current hard limit for action responses of 100,000 characters.

You can access the AskTheCode GPT here.

It's important to note that your subscription is shared across both the AskTheCode plugin and AskTheCode GPT. As long as you authorize with the same email or GitHub account, your quota will be applied to the AskTheCode GPT as well.

s1awwhy commented 10 months ago

Thank you for your reply! After your reminder, I have a deeper understanding of AskTheCode. I tried using AskTheCode GPT and it was indeed able to handle larger file than the plugin. May I ask if ASKTheCode uses text-embedding and Vector Database technology?

dsomok commented 10 months ago

@s1awwhy, sorry for the late response.

As of now, the AskTheCode plugin/GPT does not utilize text-embeddings or vector databases. Currently AskTheCode does not store user code in any form on it's end (be it raw, chunked, or indexed)

s1awwhy commented 10 months ago

OK, thanks for your reply and help.