enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
https://big-agi.com
MIT License
5.53k stars 1.27k forks source link

[BUG] o1-preview Model Error: Unsupported system Role in Messages in big-agi-2 branch #639

Open zubus opened 2 months ago

zubus commented 2 months ago

Description

I encountered an error while using the o1-preview model in the big-agi-2 branch. The error message states that messages.role does not support system with this model. This appears to be a limitation of the o1-preview and o1-mini models, which only accepts user and assistant roles in the messages array.

imagen

Device and browser

Firefox

Screenshots and more

No response

Willingness to Contribute

hourianto commented 2 months ago

Yeah, o1 is quite weird currently, here's a quick and dirty patch for it (partially Sonnet-made), also beware that the model itself currently does not support images and streaming.

As a file: https://paste.debian.net/plainh/a4a9f800, so you can just do:

curl -O https://paste.debian.net/plainh/a4a9f800
git apply a4a9f800
npm run build

Regarding the max_tokens change if you're curios - OpenAI has made max_completion_tokens the new argument that's supposed to be used instead, but I didn't completely remove it as older proxies and services might still only accept max_tokens. But OpenAI themselves recommend developers to only use max_completion_tokens for all models from now on, not just o1.

As plain text:

diff --git a/src/modules/aix/client/aix.client.ts b/src/modules/aix/client/aix.client.ts
index efd3f843..a19d81ea 100644
--- a/src/modules/aix/client/aix.client.ts
+++ b/src/modules/aix/client/aix.client.ts
@@ -256,14 +256,14 @@ async function _aix_LL_ChatGenerateContent(
   const contentReassembler = new ContentReassembler(llAccumulator);

   try {
-
+    const disableStreaming = getLabsDevNoStreaming() || aixModel.id.startsWith("o1")
     // tRPC Aix Chat Generation (streaming) API - inside the try block for deployment path errors
     const particles = await apiStream.aix.chatGenerateContent.mutate({
       access: aixAccess,
       model: aixModel,
       chatGenerate: aixChatGenerate,
       context: aixContext,
-      streaming: getLabsDevNoStreaming() ? false : aixStreaming, // [DEV] disable streaming if set in the UX (testing)
+      streaming: disableStreaming ? false : aixStreaming, // [DEV] disable streaming if set in the UX (testing)
       connectionOptions: getLabsDevMode() ? { debugDispatchRequestbody: true } : undefined,
     }, {
       signal: abortSignal,
diff --git a/src/modules/aix/server/dispatch/chatGenerate/adapters/openai.chatCompletions.ts b/src/modules/aix/server/dispatch/chatGenerate/adapters/openai.chatCompletions.ts
index b8de11ba..ec1f9aa0 100644
--- a/src/modules/aix/server/dispatch/chatGenerate/adapters/openai.chatCompletions.ts
+++ b/src/modules/aix/server/dispatch/chatGenerate/adapters/openai.chatCompletions.ts
@@ -35,6 +35,8 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
   const hotFixSquashMultiPartText = openAIDialect === 'deepseek';
   const hotFixThrowCannotFC = openAIDialect === 'deepseek' || openAIDialect === 'openrouter' /* OpenRouter FC support is not good (as of 2024-07-15) */ || openAIDialect === 'perplexity';

+  // Check if the model ID starts with "o1"
+  const isO1Model = model.id.startsWith('o1');

   // Throw if function support is needed but missing
   if (chatGenerate.tools?.length && hotFixThrowCannotFC)
@@ -53,6 +55,12 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
   if (hotFixAlternateUserAssistantRoles)
     chatMessages = _fixAlternateUserAssistantRoles(chatMessages);

+  // New hotfix: Replace system messages with user messages if model.id starts with "o1"
+  if (isO1Model) {
+    chatMessages = chatMessages.map(message => 
+      message.role === 'system' ? { ...message, role: 'user' } : message
+    );
+  }

   // Construct the request payload
   let payload: TRequest = {
@@ -61,8 +69,9 @@ export function aixToOpenAIChatCompletions(openAIDialect: OpenAIDialects, model:
     tools: chatGenerate.tools && _toOpenAITools(chatGenerate.tools),
     tool_choice: chatGenerate.toolsPolicy && _toOpenAIToolChoice(openAIDialect, chatGenerate.toolsPolicy),
     parallel_tool_calls: undefined,
-    max_tokens: model.maxTokens !== undefined ? model.maxTokens : undefined,
-    temperature: model.temperature !== undefined ? model.temperature : undefined,
+    max_tokens: isO1Model ? undefined : (model.maxTokens !== undefined ? model.maxTokens : undefined),
+    max_completion_tokens: isO1Model ? model.maxTokens : undefined,
+    temperature: isO1Model ? 1 : (model.temperature !== undefined ? model.temperature : undefined),
     top_p: undefined,
     n: hotFixOnlySupportN1 ? undefined : 0, // NOTE: we choose to not support this at the API level - most downstram ecosystem supports 1 only, which is the default
     stream: streaming,
diff --git a/src/modules/aix/server/dispatch/wiretypes/openai.wiretypes.ts b/src/modules/aix/server/dispatch/wiretypes/openai.wiretypes.ts
index 290afd5e..10b60260 100644
--- a/src/modules/aix/server/dispatch/wiretypes/openai.wiretypes.ts
+++ b/src/modules/aix/server/dispatch/wiretypes/openai.wiretypes.ts
@@ -199,6 +199,7 @@ export namespace OpenAIWire_API_Chat_Completions {

     // common model configuration
     max_tokens: z.number().optional(),
+    max_completion_tokens: z.number().optional(),
     temperature: z.number().min(0).max(2).optional(),
     top_p: z.number().min(0).max(1).optional(),
enricoros commented 2 months ago

Thanks guys, @hourianto @zubus. Full support landed in the 2 branch. Includes workarounds, cost accounting, and showing the reasoning tokens breakdown.

Could you try it out and let me know how is your experience?

Screenshot_20240914_134304_Chrome.jpg

hourianto commented 2 months ago

Thanks, yeah, the new patch works, although I had to go into model settings for OpenAI and press the Model check button again so it updated the model entries.

zubus commented 2 months ago

Thanks, yeah, the new patch works, although I had to go into model settings for OpenAI and press the Model check button again so it updated the model entries.

I had to follow the same steps. Now both models work well. Thanks for the fix

enricoros commented 2 months ago

Should I add a flag to force-reload the models for the users, once updates are needed?

In this case I added a special flag to the model to make the UI more aware of this model (and still want to message this to the users better, so they're informed about the tradeoffs). I don't know if people would enjoy the auto-reload or not, but for large changes like the o1 series, it's important.

SamKr commented 2 months ago

Not related to this ticket but a setting to autoreload models would be fantastic. The people I share big-AGI with aren't all tech savvy, and going to preferences -> models -> click model reload -> select next provider -> etc. is hard to get them to do. A notification with 'provider x has new models available, would you like to reload?' would be great as well.

enricoros commented 2 months ago

That's a good experience suggestion thank you @SamKr. For the sake of time, I cannot signal that notification, because that will require to keep on scanning the models and show a notification in case of deltas. Easy but also not at the same time, because the user would have modified/deleted models in the meantime.

How would you rank the following options:

  1. (current) manual only with auto-refresh when the server changes environment variables
  2. also auto-refresh when a new version of big-AGI is released (1.15, 1.16, 2.0, etc.)
  3. auto-refresh upon every new load in every user's browser, notifying the user of new models
  4. ...?
SamKr commented 2 months ago

In my experience new models only appear when I sync after an update, is that correct? But that's the only time I sync, so perhaps big-AGI can actually fetch new models even without an update?

Either way, I'd say 2 is a superb option, no user interaction at all. That would probably cover all my use cases. It also sounds like it wouldn't cost you too much effort. If there is a new model released between updates, and a user really requires it for some reason, they'd just have to refresh the list manually or wait for an update.

3 is a good option as well, if big-AGI can fetch new models even between updates. Not sure if starting a new conversation counts as a new load, but I'd say exclude that, so it would just auto-refresh when a user browses to big-AGI in the morning (I and most other users I know of keep it open all day).

Option 1 rarely applies I think, since that's not something you'd do on a regular basis.

Thanks for listening :)

enricoros commented 2 months ago

Thanks for the suggestion. Prob 2 is the best compromise, great way to go.