Open zubus opened 2 months ago
Yeah, o1 is quite weird currently, here's a quick and dirty patch for it (partially Sonnet-made), also beware that the model itself currently does not support images and streaming.
As a file: https://paste.debian.net/plainh/a4a9f800, so you can just do:
curl -O https://paste.debian.net/plainh/a4a9f800
git apply a4a9f800
npm run build
Regarding the max_tokens
change if you're curios - OpenAI has made max_completion_tokens
the new argument that's supposed to be used instead, but I didn't completely remove it as older proxies and services might still only accept max_tokens
. But OpenAI themselves recommend developers to only use max_completion_tokens
for all models from now on, not just o1.
As plain text:
diff --git a/src/modules/aix/client/aix.client.ts b/src/modules/aix/client/aix.client.ts
index efd3f843..a19d81ea 100644
--- a/src/modules/aix/client/aix.client.ts
+++ b/src/modules/aix/client/aix.client.ts
@@ -256,14 +256,14 @@ async function _aix_LL_ChatGenerateContent(
const contentReassembler = new ContentReassembler(llAccumulator);
try {
-
+ const disableStreaming = getLabsDevNoStreaming() || aixModel.id.startsWith("o1")
// tRPC Aix Chat Generation (streaming) API - inside the try block for deployment path errors
const particles = await apiStream.aix.chatGenerateContent.mutate({
access: aixAccess,
model: aixModel,
chatGenerate: aixChatGenerate,
context: aixContext,
- streaming: getLabsDevNoStreaming() ? false : aixStreaming, // [DEV] disable streaming if set in the UX (testing)
+ streaming: disableStreaming ? false : aixStreaming, // [DEV] disable streaming if set in the UX (testing)
connectionOptions: getLabsDevMode() ? { debugDispatchRequestbody: true } : undefined,
}, {
signal: abortSignal,
diff --git a/src/modules/aix/server/dispatch/chatGenerate/adapters/openai.chatCompletions.ts b/src/modules/aix/server/dispatch/chatGenerate/adapters/openai.chatCompletions.ts
index b8de11ba..ec1f9aa0 100644
--- a/src/modules/aix/server/dispatch/chatGenerate/adapters/openai.chatCompletions.ts
+++ b/src/modules/aix/server/dispatch/chatGenerate/adapters/openai.chatCompletions.ts
@@ -35,6 +35,8 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
const hotFixSquashMultiPartText = openAIDialect === 'deepseek';
const hotFixThrowCannotFC = openAIDialect === 'deepseek' || openAIDialect === 'openrouter' /* OpenRouter FC support is not good (as of 2024-07-15) */ || openAIDialect === 'perplexity';
+ // Check if the model ID starts with "o1"
+ const isO1Model = model.id.startsWith('o1');
// Throw if function support is needed but missing
if (chatGenerate.tools?.length && hotFixThrowCannotFC)
@@ -53,6 +55,12 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
if (hotFixAlternateUserAssistantRoles)
chatMessages = _fixAlternateUserAssistantRoles(chatMessages);
+ // New hotfix: Replace system messages with user messages if model.id starts with "o1"
+ if (isO1Model) {
+ chatMessages = chatMessages.map(message =>
+ message.role === 'system' ? { ...message, role: 'user' } : message
+ );
+ }
// Construct the request payload
let payload: TRequest = {
@@ -61,8 +69,9 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
tools: chatGenerate.tools && _toOpenAITools(chatGenerate.tools),
tool_choice: chatGenerate.toolsPolicy && _toOpenAIToolChoice(openAIDialect, chatGenerate.toolsPolicy),
parallel_tool_calls: undefined,
- max_tokens: model.maxTokens !== undefined ? model.maxTokens : undefined,
- temperature: model.temperature !== undefined ? model.temperature : undefined,
+ max_tokens: isO1Model ? undefined : (model.maxTokens !== undefined ? model.maxTokens : undefined),
+ max_completion_tokens: isO1Model ? model.maxTokens : undefined,
+ temperature: isO1Model ? 1 : (model.temperature !== undefined ? model.temperature : undefined),
top_p: undefined,
n: hotFixOnlySupportN1 ? undefined : 0, // NOTE: we choose to not support this at the API level - most downstram ecosystem supports 1 only, which is the default
stream: streaming,
diff --git a/src/modules/aix/server/dispatch/wiretypes/openai.wiretypes.ts b/src/modules/aix/server/dispatch/wiretypes/openai.wiretypes.ts
index 290afd5e..10b60260 100644
--- a/src/modules/aix/server/dispatch/wiretypes/openai.wiretypes.ts
+++ b/src/modules/aix/server/dispatch/wiretypes/openai.wiretypes.ts
@@ -199,6 +199,7 @@ export namespace OpenAIWire_API_Chat_Completions {
// common model configuration
max_tokens: z.number().optional(),
+ max_completion_tokens: z.number().optional(),
temperature: z.number().min(0).max(2).optional(),
top_p: z.number().min(0).max(1).optional(),
Thanks guys, @hourianto @zubus. Full support landed in the 2 branch. Includes workarounds, cost accounting, and showing the reasoning tokens breakdown.
Could you try it out and let me know how is your experience?
Thanks, yeah, the new patch works, although I had to go into model settings for OpenAI and press the Model check button again so it updated the model entries.
Thanks, yeah, the new patch works, although I had to go into model settings for OpenAI and press the Model check button again so it updated the model entries.
I had to follow the same steps. Now both models work well. Thanks for the fix
Should I add a flag to force-reload the models for the users, once updates are needed?
In this case I added a special flag to the model to make the UI more aware of this model (and still want to message this to the users better, so they're informed about the tradeoffs). I don't know if people would enjoy the auto-reload or not, but for large changes like the o1 series, it's important.
Not related to this ticket but a setting to autoreload models would be fantastic. The people I share big-AGI with aren't all tech savvy, and going to preferences -> models -> click model reload -> select next provider -> etc. is hard to get them to do. A notification with 'provider x has new models available, would you like to reload?' would be great as well.
That's a good experience suggestion thank you @SamKr. For the sake of time, I cannot signal that notification, because that will require to keep on scanning the models and show a notification in case of deltas. Easy but also not at the same time, because the user would have modified/deleted models in the meantime.
How would you rank the following options:
In my experience new models only appear when I sync after an update, is that correct? But that's the only time I sync, so perhaps big-AGI can actually fetch new models even without an update?
Either way, I'd say 2 is a superb option, no user interaction at all. That would probably cover all my use cases. It also sounds like it wouldn't cost you too much effort. If there is a new model released between updates, and a user really requires it for some reason, they'd just have to refresh the list manually or wait for an update.
3 is a good option as well, if big-AGI can fetch new models even between updates. Not sure if starting a new conversation counts as a new load, but I'd say exclude that, so it would just auto-refresh when a user browses to big-AGI in the morning (I and most other users I know of keep it open all day).
Option 1 rarely applies I think, since that's not something you'd do on a regular basis.
Thanks for listening :)
Thanks for the suggestion. Prob 2 is the best compromise, great way to go.
Description
I encountered an error while using the
o1-preview
model in the big-agi-2 branch. The error message states that messages.role does not supportsystem
with this model. This appears to be a limitation of the o1-preview and o1-mini models, which only acceptsuser
andassistant
roles in the messages array.Device and browser
Firefox
Screenshots and more
No response
Willingness to Contribute