Closed vYLQs6 closed 2 days ago
Could you provide an example of a broken prompt? As far as I know, the internal length setup should work.
I tested this on ollama (PC) using Gemma 2 + 8K context, works fine
You should consider let user adjust this internal length setup
, since this is a local app, it's not likes you are an online API provider
Summarize this:
How does OpenAI train the Strawberry🍓 (o1) model to spend more time thinking?
I read the report. The report is mostly about 𝘸𝘩𝘢𝘵 impressive benchmark results they got. But in term of the 𝘩𝘰𝘸, the report only offers one sentence:
"Through reinforcement learning, o1 learns to hone its chain of thought and refine the strategies it uses."
I did my best to understand this sentence. I drew this animation to share my best understanding with you.
The two key phrases in this sentence are: Reinforcement Learning (RL) and Chain of Thought (CoT).
Among the contributors listed in the report, two individuals stood out to me:
Ilya Sutskever, the inventor of RL with Human Feedback (RLHF). He left OpenAI and just started a new company, Safe Superintelligence. Listing Ilya tells me that RLHF still plays a role in training the Strawberry model.
Jason Wei, the author of the famous Chain of Thought paper. He left Google Brain to join OpenAI last year. Listing Jason tells me that CoT is now a big part of RLHF alignment process.
Here are the points I hope to get across in my animation:
💡In RLHF+CoT, the CoT tokens are also fed to the reward model to get a score to update the LLM for better alignment, whereas in the traditional RLHF, only the prompt and response are fed to the reward model to align the LLM.
💡At the inference time, the model has learned to always start by generating CoT tokens, which can take up to 30 seconds, before starting to generate the final response. That's how the model is spending more time to think!
There are other important technical details missing, like how the reward model was trained, how human preferences for the "thinking process" were elicited...etc.
Finally, as a disclaimer, this animation represents my best educated guess. I can't verify the accuracy. I do wish someone from OpenAI can jump out to correct me. Because if they do, we will all learn something useful! 🙌
Note that for local models, it uses a separate context length setting when you first load the model in the API menu, not in the sampler settings.
Note that for local models, it uses a separate context length setting when you first load the model in the API menu, not in the sampler settings.
So you are saying you actually tested the prompt and it did work on your APP???
And I know the context length is at API menu, there ain't many options in the API menu, I can see where it is
Yep, I tested with 2k context length and it seems to work just fine.
However, I do have a possible reason for why your gens are far below the limit: https://github.com/Vali-98/ChatterUI/issues/60#issuecomment-2273651360
its possible that your Generated Length is too high, and its eating up all context space, as context is built as:
this.buildTextCompletionContext(localPreset.context_length - n_predict),
This is necessary as both context and generated tokens share that same max context length.
Oh yes, I was using 8k Generated Length, as same as the context length, since that's how I configured my models on ollama, thank you for explain this, have a nice day
Despite enabling an 8K context window in ChatterUI, longer prompts are not being forwarded to the local API.
This issue suggests a potential limitation within ChatterUI's prompt handling, preventing it from utilizing the full 8K capacity. To resolve this, adjusting the internal prompt length limit within ChatterUI is crucial. Since my prompts fall well within the configured 8K context, they should be seamlessly transmitted to the API for processing.