chhoumann / quickadd

QuickAdd for Obsidian
https://quickadd.obsidian.guide
MIT License
1.48k stars 135 forks source link

[BUG] Quickadd can't process large context for gpt-4-1106-preview #602

Closed Triquetra closed 9 months ago

Triquetra commented 9 months ago

Describe the bug When selecting a large portion of text (~12K words), produces the following errors:

Uncaught (in promise) Error: Error while making request to OpenAI API: Request failed, status 429 at makeRequest (plugin:quickadd:11192:13)

This should still be well below the 128K token context limit for the model.

To Reproduce Steps to reproduce the behavior:

  1. Go to Obsidian
  2. Select large text area
  3. Try to run macro using GPT-4-1106-preview model
  4. See error

Additional context Using the same template on smaller text selections works as expected.

chhoumann commented 9 months ago

Hey @Triquetra - HTTP 429 Too Many Requests response status code indicates that there has been sent too many requests in a given amount of time.

In other words, it seems you reached the rate limit. I was able to fire a good few requests with e.g. 21k, 9k, and 12k tokens, and they all succeeded. But after a bit, I too got rate-limited and got the 429 error.

chhoumann commented 9 months ago

OpenAI has set a rate limit of 40k tokens per minute, so even though the model supports much more than that, it isn't really feasible to send more than 40k tokens in a single prompt.

https://platform.openai.com/account/limits

This is for my usage tier (3), at least. You can check whether this is the same for you.

Triquetra commented 9 months ago

Have you considered implementing vector store embeddings and completions to overcome these context limits?

chhoumann commented 9 months ago

I've actually built Chunked Prompts to overcome context size limits.

I found that I more often needed large amounts of text processed, rather than specific lookup of chunks of text. E.g. a vector store wouldn't solve the issue of summarizing an entire book, only the problem of answering a question based on the text in a book.

And since many vector stores can be used through simple HTTP APIs, it would be trivial to create a user script to send one's data to it, and then make a lookup every time one wanted to prompt the LLM using some text from the store.