Vali-98 / ChatterUI

Simple frontend for LLMs built in react-native.
GNU Affero General Public License v3.0
313 stars 18 forks source link

Allow longer prompt #84

Closed vYLQs6 closed 2 days ago

vYLQs6 commented 3 days ago

Despite enabling an 8K context window in ChatterUI, longer prompts are not being forwarded to the local API.

This issue suggests a potential limitation within ChatterUI's prompt handling, preventing it from utilizing the full 8K capacity. To resolve this, adjusting the internal prompt length limit within ChatterUI is crucial. Since my prompts fall well within the configured 8K context, they should be seamlessly transmitted to the API for processing.

Vali-98 commented 3 days ago

Could you provide an example of a broken prompt? As far as I know, the internal length setup should work.

vYLQs6 commented 3 days ago

I tested this on ollama (PC) using Gemma 2 + 8K context, works fine

You should consider let user adjust this internal length setup, since this is a local app, it's not likes you are an online API provider

Summarize this:

How does OpenAI train the Strawberry🍓 (o1) model to spend more time thinking?

I read the report. The report is mostly about 𝘸𝘩𝘢𝘵 impressive benchmark results they got. But in term of the 𝘩𝘰𝘸, the report only offers one sentence:

"Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses."

I did my best to understand this sentence. I drew this animation to share my best understanding with you.

The two key phrases in this sentence are: Reinforcement Learning (RL) and Chain of Thought (CoT).

Among the contributors listed in the report, two individuals stood out to me:

Ilya Sutskever, the inventor of RL with Human Feedback (RLHF). He left OpenAI and just started a new company, Safe Superintelligence. Listing Ilya tells me that RLHF still plays a role in training the Strawberry model.

Jason Wei, the author of the famous Chain of Thought paper. He left Google Brain to join OpenAI last year. Listing Jason tells me that CoT is now a big part of RLHF alignment process.

Here are the points I hope to get across in my animation:

💡In RLHF+CoT, the CoT tokens are also fed to the reward model to get a score to update the LLM for better alignment, whereas in the traditional RLHF, only the prompt and response are fed to the reward model to align the LLM.

💡At the inference time, the model has learned to always start by generating CoT tokens, which can take up to 30 seconds, before starting to generate the final response. That's how the model is spending more time to think!

There are other important technical details missing, like how the reward model was trained, how human preferences for the "thinking process" were elicited...etc.

Finally, as a disclaimer, this animation represents my best educated guess. I can't verify the accuracy. I do wish someone from OpenAI can jump out to correct me. Because if they do, we will all learn something useful! 🙌
Vali-98 commented 2 days ago

Note that for local models, it uses a separate context length setting when you first load the model in the API menu, not in the sampler settings.

vYLQs6 commented 2 days ago

Note that for local models, it uses a separate context length setting when you first load the model in the API menu, not in the sampler settings.

So you are saying you actually tested the prompt and it did work on your APP???

And I know the context length is at API menu, there ain't many options in the API menu, I can see where it is

Vali-98 commented 2 days ago

Yep, I tested with 2k context length and it seems to work just fine.

However, I do have a possible reason for why your gens are far below the limit: https://github.com/Vali-98/ChatterUI/issues/60#issuecomment-2273651360

its possible that your Generated Length is too high, and its eating up all context space, as context is built as:

this.buildTextCompletionContext(localPreset.context_length - n_predict),

This is necessary as both context and generated tokens share that same max context length.

vYLQs6 commented 2 days ago

Oh yes, I was using 8k Generated Length, as same as the context length, since that's how I configured my models on ollama, thank you for explain this, have a nice day