Feature: User custom Max Token Limit

Anteemony / RAG-Playground

Chat with your PDF file using the LLM of your choice

https://unify-rag-playground.streamlit.app/

7 stars 5 forks source link

Feature: User custom Max Token Limit #28

Open Anteemony opened 4 months ago

Anteemony commented 4 months ago

Feature allows the user to set the adjust the max_tokens_limit sent to the LLM This can be done with a slider or text input. It should have a Minimum value it can accept e.g. 100 and the maximum value should be the accounted LLM input limit. This will allow users to use powerful models and save input costs.

OscarArroyoVega commented 4 months ago

As the output sent to LLM will depend on 1 prompt template, 2 user queries, and 3 chat histories, we can control it with some strategies:

Short prompt template (in our hands)
Limit to user input
Switch button I/O (history_Chat On/Off )
limit to chat_history context: https://python.langchain.com/docs/modules/memory/types/buffer_window/ https://python.langchain.com/docs/modules/memory/types/token_buffer/ There are other ways, but It seems to me these are the most intuitive (to start)

martinoywa commented 4 months ago

@Anteemony @OscarArroyoVega, is this one still open? I haven't seen the current implementation using max_token_limit.

Anteemony commented 4 months ago

Hello. No it’s not implemented yet.

This feature exists for the documents retrieved by the retriever.

If you take a look, you’ll see it’s slider UI is commented out (TODO) in tabs/play.py

The data collected there is to be passed to the format_docs function where the total documents retrieved will be reduced until the tokens fall under the user’s preferred limit

Anteemony commented 4 months ago

Apologies, the initial issue description does not do Justice to the objective. The entire input token shouldn’t just be truncated from the front or the back because that would lead to loss of important information.

Truncating the tokens retrieved seems like the most effective method.

Also, the model max token limit can come into play here as the max value of the slider.

Perhaps the slider max value would have keep in mind the tokens brought about by the system prompt. Something along the lines of (max = model max tokens - prompt tokens ). This way the user can have a more accurate max token option that won’t lead to errors.