brainlid / langchain

Elixir implementation of a LangChain style framework.
https://hexdocs.pm/langchain/
Other
505 stars 58 forks source link

Is it possible to use this library to send an image for use with the "gpt-4-vision-preview" model? #38

Closed matthewsinclair closed 7 months ago

matthewsinclair commented 7 months ago

I am experimenting with different libs to work with the OpenAI GPT APIs. I am trying to work out how to send an image with LangChain but nothing I do seems to work. I had a similar issue with ExOpenAI library (now solved) which you can see here: https://github.com/dvcrn/ex_openai/issues/13

I am trying to work out how to do something equivalent to this (which uses a raw HTTPoison's HTML post request):

  defp describe_image_using_httpoison(data, prompt) do
    payload = %{
      "model" => get_openai_description_model(),
      "messages" => [
        %{
          "role" => "user",
          "content" => [
            %{"type" => "text", "text" => prompt},
            %{
              "type" => "image_url",
              "image_url" => %{
                "url" => "data:image/jpeg;base64," <> data.image.data64
              }
            }
          ]
        }
      ],
      "max_tokens" => 1_000
    }

    case HTTPoison.post!(
           "https://api.openai.com/v1/chat/completions",
           Jason.encode!(payload),
           get_headers(),
           recv_timeout: 20000
         ) do
      %HTTPoison.Response{status_code: 200, body: body} ->
        case Jason.decode(body) do
          {:ok, content} ->
            [result | _] = content["choices"]
            description = result["message"]["content"]
            description

          error ->
            dbg(error)
        end

      error ->
        dbg(error)
    end
  end

That is just a snippet, but the "type" => "image_url"... bit is the bit I am trying to replicate with LangChain.

I have tried this:

  def describe(data, user_prompt \\ @desc_user_prompt) do
    {:ok, _updated_chain, response} =
      %{llm: ChatOpenAI.new!(%{model: @llm_model})}
      |> LLMChain.new!()
      |> LLMChain.add_messages([
        Message.new_system!(@desc_system_prompt),
        Message.new_user!(user_prompt),
        Message.new_user!(get_prompt_attrs_for_image_from_data(data.image))
      ])
      |> LLMChain.run()

    dbg(response)
    Map.put(data, :description, response)
    response.content
  end

  defp get_prompt_attrs_for_image_from_data(%STL.ML.ImageData{src: _, data64: imgdata64, error: _ }) do
    {:ok, content } = %{
      type: :image_url,
      image_url: %{url: "data:image/jpeg;base64," <> imgdata64}
    } |> Jason.encode()
    content
    # %{
    #   role: :user,
    #   content: %{
    #     type: :image_url,
    #     image_url: %{url: "data:image/jpeg;base64," <> imgdata64}
    #   }
    # }
  end

But no matter what I do with the function get_prompt_attrs_for_image_from_data nothing seems to work. If I just encode the content as a string, then the OpenAI API flips out with a "too many tokens" because the image data is too big. But anything other than a string for content causes a validation error from LangChain.

Is there any way to send arbitrary post params in a LangChain call?

PS: For reference, this is how OpenAI describes the type: :image_url params: https://platform.openai.com/docs/guides/vision

brainlid commented 7 months ago

Hi @matthewsinclair! No, nothing has been done yet to support sending images like that.

Is there any way to send arbitrary post params in a LangChain call?

No. Currently the features are centered around text based messages and functions.

That's a cool API that I haven't played with yet.

matthewsinclair commented 7 months ago

Hi @brainlid. Thanks so much for the quick reply.

I understand this has not been implemented yet, which explains why I couldn't get it to work!

I will close this now and wait for an update. Thanks again for all the hard work here.

All the best, M@

wakeless commented 6 months ago

I've implemented this here: https://github.com/brainlid/langchain/pull/62 it's pretty incomplete from the OpenAI side, but this might help you.