Problem using `gpt-4-vision-preview` with `ExOpenAI.Completions.create_completion`

matthewsinclair commented 8 months ago

Hi everyone. I am not quite sure where or who to ask about this one, but I’ll start here. I am trying to use OpenAI’s new “gpt-4-vision-preview” model. I have two bits of code, one using a RAW post via HTTPPoison and the other using ExElixir. The raw call works, but the ExElixir one doesn’t. As far as I can tell, I am sending in exactly the same params, but I’m obviously doing something dumb because one call works and the other does not. Is there anything obvious here that would lead to the error in the second case?

This call (using a raw post via httppoison) works:

IO.puts("----------------------")
IO.puts("Image Description: RAW")
IO.puts("----------------------")

openai_api_key = System.get_env("OPENAI_API_KEY")
openai_organisation_key = System.get_env("OPENAI_ORGANIZATION_KEY")

# Path to your image
image_path = home_dir <> "image.png"

# Getting the base64 string
base64_image = image_path |> File.read!() |> Base.encode64()

headers = [
  {"Content-Type",  "application/json"},
  {"Authorization", "Bearer " <> openai_api_key},
  {"Organisation",  "Bearer " <> openai_organisation_key}
]

payload = %{
  "model" => "gpt-4-vision-preview",
  "messages" => [
    %{
      "role" => "user",
      "content" => [
        %{"type" => "text", "text" => "Describe this image"},
        %{
          "type" => "image_url",
          "image_url" => %{
            "url" => "data:image/jpeg;base64," <> base64_image
          }
        }
      ]
    }
  ],
  "max_tokens" => 300
}
dbg(payload)

# Perform the HTTP POST request
response = HTTPoison.post!("https://api.openai.com/v1/chat/completions", Jason.encode!(payload), headers, [recv_timeout: 10000])

# Print the JSON response
IO.inspect(Jason.decode!(response.body))

This one (using ex_openai) does not:

IO.puts("-------------------------")
IO.puts("Image Description: OpenAI")
IO.puts("-------------------------")

model = "gpt-4-vision-preview"
prompt = "Describe this image"
image = File.read!(home_dir <> "image.png")
image64 = Base.encode64(image)
msgs = [ 
  %{ 
    content: [ 
      %{type: "text", text: prompt },  
      %{
        type: "image", 
        image_url: %{
          "url" => "data:image/jpeg;base64," <> image64
        }
      }
    ],
    role: "user"
  } 
]
{:ok, msgs_json} = msgs |> Jason.encode 
dbg(msgs)
dbg(msgs_json)

case ExOpenAI.Completions.create_completion(model, msgs_json, max_tokens: 300) do 
  {:ok, result_json} -> 
    dbg result_json

  {:error, reason_json} -> 
    dbg reason_json
end

The error in the second case is:

%{
  "error" => %{
    "code" => nil,
    "message" => "2 validation errors for Request\nbody -> prompt\n  value is not a valid list (type=type_error.list)\nbody -> prompt\n  value is not a valid list (type=type_error.list)",
    "param" => nil,
    "type" => "invalid_request_error"
  }
}

There is obviously something a bit weird (or different) in how ExOpenAI.Completions.create_completion wraps up the params that I am not understanding or misconfiguring somehow.

Any ideas would be very helpful! Thanks.

dvcrn commented 8 months ago

Thanks for reporting, let me look into that.

Just checking - have you tried the latest main branch? I've just on the weekend updated the docs to fit the latest OpenAI state

matthewsinclair commented 8 months ago

No probs. Happy to help! Thanks for putting the effort into this awesome lib! I have not checked the main branch, but I'll give it a go today and see if that works. Thanks.

dvcrn commented 8 months ago

Here you go, did a quick test in iex and it works on latest head. Just a bit weirdly formatted :)

Code:

            msgs = [
                %ExOpenAI.Components.ChatCompletionRequestUserMessage{role: :user,
                content: [%ExOpenAI.Components.ChatCompletionRequestMessageContentPartImage{
                    type: :image_url,
                    image_url: "https://lh3.googleusercontent.com/wAPeTvxh_EwOisF8kMR2L2eOrIOzjfA5AjE28W5asyfGeH85glwrO6zyqL71dCC26R63chADTO7DLOjnqRoXXOAB8t2f4C3QnU6o0BA"
                }
                ]},
            ]
ExOpenAI.Chat.create_chat_completion(msgs, "gpt-4-vision-preview")

{:ok,
 %ExOpenAI.Components.CreateChatCompletionResponse{
   id: "chatcmpl-8KNhE1XbdgD1TqdXOugxUk494QkZA",
   created: 1699868340,
   object: "chat.completion",
   model: "gpt-4-1106-vision-preview",
   choices: [
     %{
       index: 0,
       message: %{
         role: "assistant",
         content: "You've shared an image of the Google Cloud Platform (GCP) logo."
       },
       finish_details: %{type: "max_tokens"}
     }
   ],
   usage: %{prompt_tokens: 262, total_tokens: 278, completion_tokens: 16},
   system_fingerprint: nil
 }}

You don't need to encode the messages through Jason, just follow the typespec should be sufficient

matthewsinclair commented 8 months ago

What if the image is local (ie just available via File.load!) and not a URL?

dvcrn commented 8 months ago

I’m on my phone and can’t try it but you should be able load it like you did, then pass into the image_url param as URI with the code I sent above

https://github.com/dvcrn/ex_openai/blob/main/lib/ex_openai/docs/docs.yaml#L5580

matthewsinclair commented 8 months ago

Hmm, it wasn't working, and then I made one small change to the image_url param to add in "data:image/jpeg;base64," in front of the base64 encoded image data:

...
msgs = [
  %ExOpenAI.Components.ChatCompletionRequestUserMessage{
    role: :user,
    content: [
      %ExOpenAI.Components.ChatCompletionRequestMessageContentPartImage{
        type: :image_url,
        image_url: "data:image/jpeg;base64," <> image64
      }
    ]
  }
]
dbg(msgs)

And now it works! Thanks!

PS: I also needed to add in the max-tokens: 1_000 option at the end of the call to create_chat_completion like this:

... ExOpenAI.Chat.create_chat_completion(msgs, "gpt-4-vision-preview", max_tokens: 1_000)

dvcrn commented 8 months ago

Glad to hear it worked! 👍

dvcrn / ex_openai

Problem using `gpt-4-vision-preview` with `ExOpenAI.Completions.create_completion` #13