AllYourBot / hostedgpt

An open version of ChatGPT you can host anywhere or run locally.
MIT License
269 stars 111 forks source link

Image generation with Dall-e #398

Open ceicke opened 4 weeks ago

ceicke commented 4 weeks ago

@krschacht this is wild... and so easy to implement.

On a more serious note, is this the direction you are looking for?

I would need help to get the current user in the Toolbox::Dalle class. Not sure how I can access it to get the openai_api_key.

krschacht commented 4 weeks ago

@ceicke I know, right? I'm really excited about tools. I'm adding a bunch of them. I want to turn this project into my own personal Jarvis / Samantha / Computer (star trek) — pick your favorite sci fi computer reference. :)

On the API key, I already solved this in a branch so I just plucked it into main. Rebase and you can now access Current.user within your tool. commit: 00be174d391d4407a5d6721574f3d0bcc99aebcc

krschacht commented 3 weeks ago

Can you share all 4 messages? The one you wrote, the 2 tool-related ones, and then this was the 4th one, I believe:

On Thu, Jun 13, 2024 at 8:15 AM Christoph Eicke @.***> wrote:

@.**** commented on this pull request.

In app/services/toolbox/dalle.rb https://github.com/AllYourBot/hostedgpt/pull/398#discussion_r1638197861:

@@ -0,0 +1,19 @@ +class Toolbox::Dalle < Toolbox +

  • describe :generate_an_image, <<~S
  • Generate an image based on what the user asks you to generate. You will pass the user's prompt and will get back a URL to an image.

Ok, I understand what you mean. Here is the Message that I got as a reply:

=> #<Message:0x0000ffff9b0cc6c8 id: 1048909790, conversation_id: 962582977, role: "assistant", content_text: "Here is an image of a bottle:\n\nBottle", created_at: Thu, 13 Jun 2024 12:58:49.176219000 UTC +00:00, updated_at: Thu, 13 Jun 2024 12:58:56.704132000 UTC +00:00, content_document_id: nil, run_id: nil, assistant_id: 1013349885, processed_at: Thu, 13 Jun 2024 12:58:49.259744000 UTC +00:00, index: 3, version: 2, cancelled_at: nil, branched: false, branched_from_version: nil, content_tool_calls: {}, tool_call_id: nil, versions: "">

As you can see, the .content_text contains the Markdown including an image link. I think this part needs to be extracted, image downloaded and attached to the Message object like you described and then this part of the .content_text markup needs to be removed.

Is it like this, or am I missing something?

— Reply to this email directly, view it on GitHub https://github.com/AllYourBot/hostedgpt/pull/398#discussion_r1638197861, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAIR5PJIMY6LWPPI3K6PMTZHGLQBAVCNFSM6AAAAABJCQTRQKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMJVG43DMOJTG4 . You are receiving this because you were mentioned.Message ID: @.***>

ceicke commented 3 weeks ago

I just pushed my changes... I can't quite get the document attached to the message and I don't understand why... maybe you can spot it. Code is not really nice to look at.

krschacht commented 3 weeks ago

I'm taking a look. But the key thing is that the reason GPT-4o is including the image as markdown is just because it's doing it's best to be helpful. We don't want to parse that markdown and grab the image, instead we want to tell GPT not to include the image in markdown because we're going to pre-attach it.

I'll over-share a little bit here just in case the additional context helps you.

First, I had a simple conversation — 1 message and 1 response:

image

And in console I see there are 4 messages, as I expected. There are 2 hidden tool messages preceding the assistants reply:

> Conversation.find(7).messages.ordered
  Conversation Load (1.5ms)  SELECT "conversations".* FROM "conversations" WHERE "conversations"."id" = $1 LIMIT $2  [["id", 7], ["LIMIT", 1]]
  Message Load (0.9ms)  SELECT "messages"."index", "messages"."version" FROM "messages" WHERE "messages"."conversation_id" = $1 ORDER BY "messages"."version" DESC, "messages"."i
=> [#<Message:0x0000000109bab558
  id: 43,
  conversation_id: 7,
  role: "user",
  content_text: "can you generate a cartoon image of a cat",
  created_at: Thu, 13 Jun 2024 14:44:23.945717000 UTC +00:00,
  updated_at: Thu, 13 Jun 2024 14:44:23.945717000 UTC +00:00,
  content_document_id: nil,
  run_id: nil,
  assistant_id: 1,
  cancelled_at: nil,
  processed_at: nil,
  index: 0,
  version: 1,
  branched: false,
  branched_from_version: nil,
  content_tool_calls: {},
  tool_call_id: nil,
  versions: nil>,
 #<Message:0x0000000109baa3d8
  id: 44,
  conversation_id: 7,
  role: "assistant",
  content_text: "",
  created_at: Thu, 13 Jun 2024 14:44:23.958958000 UTC +00:00,
  updated_at: Thu, 13 Jun 2024 14:44:40.943010000 UTC +00:00,
  content_document_id: nil,
  run_id: nil,
  assistant_id: 1,
  cancelled_at: nil,
  processed_at: Thu, 13 Jun 2024 14:44:24.394137000 UTC +00:00,
  index: 1,
  version: 1,
  branched: false,
  branched_from_version: nil,
  content_tool_calls:
   [{:index=>0,
     :id=>"call_s3cYMxsn11IfPxEMsxMp9Ycb",
     :type=>"function",
     :function=>{:name=>"dalle_generate_an_image", :arguments=>"{\"image_generation_prompt\":\"cartoon image of a cat\"}"}}],
  tool_call_id: nil,
  versions: nil>,
 #<Message:0x0000000109baa298
  id: 45,
  conversation_id: 7,
  role: "tool",
  content_text:
   "{\"url\":\"https://oaidalleapiprodscus.blob.core.windows.net/private/org-a24zvqoU0AA6ZH1jX1jiJmzF/user-PfPIZghymt4g6e00oFxN9yvL/img-vi2fAv0uQ9EQTtyntuu0Qd1L.png?st=2024-06-13T13%3A44%3A40Z\\u0026se=2024-06-13T15%3A44%3A40Z\\u0026sp=r\\u0026sv=2023-11-03\\u0026sr=b\\u0026rscd=inline\\u0026rsct=image/png\\u0026skoid=6aaadede-4fb3-4698-a8f6-684d7786b067\\u0026sktid=a48cca56-e6da-484e-a814-9c849652bcb3\\u0026skt=2024-06-12T18%3A49%3A15Z\\u0026ske=2024-06-13T18%3A49%3A15Z\\u0026sks=b\\u0026skv=2023-11-03\\u0026sig=kSyBRJW4R%2BvRWNO0xQt8FJ38VBWAirUkx8T2CzKm/0g%3D\"}",
  created_at: Thu, 13 Jun 2024 14:44:40.861781000 UTC +00:00,
  updated_at: Thu, 13 Jun 2024 14:44:40.861781000 UTC +00:00,
  content_document_id: nil,
  run_id: nil,
  assistant_id: 1,
  cancelled_at: nil,
  processed_at: Thu, 13 Jun 2024 14:44:40.838686000 UTC +00:00,
  index: 2,
  version: 1,
  branched: false,
  branched_from_version: nil,
  content_tool_calls: {},
  tool_call_id: "call_s3cYMxsn11IfPxEMsxMp9Ycb",
  versions: nil>,
 #<Message:0x0000000109baa018
  id: 46,
  conversation_id: 7,
  role: "assistant",
  content_text:
   "Here's the cartoon image of a cat you requested:\n\n[Cartoon Cat](https://oaidalleapiprodscus.blob.core.windows.net/private/org-a24zvqoU0AA6ZH1jX1jiJmzF/user-PfPIZghymt4g6e00oFxN9yvL/img-vi2fAv0uQ9EQTtyntuu0Qd1L.png?st=2024-06-13T13%3A44%3A40Z&se=2024-06-13T15%3A44%3A40Z&sp=r&sv=2023-11-03&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-06-12T18%3A49%3A15Z&ske=2024-06-13T18%3A49%3A15Z&sks=b&skv=2023-11-03&sig=kSyBRJW4R%2BvRWNO0xQt8FJ38VBWAirUkx8T2CzKm/0g%3D)",
  created_at: Thu, 13 Jun 2024 14:44:40.892866000 UTC +00:00,
  updated_at: Thu, 13 Jun 2024 14:44:48.047174000 UTC +00:00,
  content_document_id: nil,
  run_id: nil,
  assistant_id: 1,
  cancelled_at: nil,
  processed_at: Thu, 13 Jun 2024 14:44:41.049892000 UTC +00:00,
  index: 3,
  version: 1,
  branched: false,
  branched_from_version: nil,
  content_tool_calls: {},
  tool_call_id: nil,
  versions: nil>]
[5] pry(main)> 

Message 2 of 4 is the API deciding to call a tool, which our system does. And then message 3 of 4 is our system creating the tool's response. This is the key one. This is the serialized hash that your function is returning as json, stored in the content text. Here is the key part:

> JSON.parse(Conversation.find(7).messages.ordered.third.content_text)
=> {"url"=>
  "https://oaidalleapiprodscus.blob.core.windows.net/private/org-a24zvqoU0AA6ZH1jX1jiJmzF/user-PfPIZghymt4g6e00oFxN9yvL/img-vi2fAv0uQ9EQTtyntuu0Qd1L.png?st=2024-06-13T13%3A44%3A40Z&se=2024-06-13T15%3A44%3A40Z&sp=r&sv=2023-11-03&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-06-12T18%3A49%3A15Z&ske=2024-06-13T18%3A49%3A15Z&sks=b&skv=2023-11-03&sig=kSyBRJW4R%2BvRWNO0xQt8FJ38VBWAirUkx8T2CzKm/0g%3D"}

First, I'll manually attach this image in console just to work through the code:

url = JSON.parse(Conversation.find(7).messages.ordered.third.content_text)["url"]
d = Message.last.documents.new
d.file.attach(io: URI.open(url), filename: "image.png")
d.save!

I just refreshed my conversation and now I see it:

image

Now, how do we do this automatically and get it to skip the markdown. It's all about handling the 3rd message differently. The place this third message got created was this line of code: https://github.com/AllYourBot/hostedgpt/blob/main/app/jobs/get_next_ai_message_job.rb#L160

So here's my suggestion for how to get the image cleanly attached and then tell the AI not to do it as markdown:

  1. In your dalle toolbox class, the response only includes the "url" key. Let's change that key to "url_of_dalle_generated_image" (verbose on purpose, using words that match the function description and function call). Also add another key such as prompt_given and one more key: `note_to_assistant: "The image at the URL is already being shown on screen so reply with a nice message confirming the image has been generated, maybe re-describing it, but don't include the link to it."
  2. Inside this msgs.each is where we created message 3, which has the URL. So add an extra conditional check right after creation which looks for the key url_of_dalle_generated_image and if it's present it creates a NEW document with the attachment but doesn't save yet (and it does NOT attach the document to the tool message)
  3. Right outside this loop is where it stubs out the assistants' response (the 4th message in the sequence). Here is where it attaches the document that was created above and executes save! on the whole thing.
ceicke commented 3 weeks ago

@krschacht again, thanks for your patience and guiding me through it. It works fine. Almost. There is an error happening and here is the important part of it:

base-1      | worker   | [SolidQueue] Claimed 1 jobs
base-1      | worker   | 
base-1      | worker   | ### Finished GetNextAIMessageJob attempt #1 with ERROR: #<ActionView::Template::Error: no implicit conversion of nil into Hash>
base-1      | worker   | /rails/lib/active_storage/service/postgresql_service.rb:106:in `block in generate_url'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications.rb:206:in `block in instrument'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications.rb:206:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/lib/active_storage/service.rb:165:in `instrument'
base-1      | worker   | /rails/lib/active_storage/service/postgresql_service.rb:93:in `generate_url'
base-1      | worker   | /rails/lib/active_storage/service/postgresql_service.rb:63:in `private_url'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/lib/active_storage/service.rb:125:in `block in url'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications.rb:206:in `block in instrument'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/notifications.rb:206:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/lib/active_storage/service.rb:165:in `instrument'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/lib/active_storage/service.rb:120:in `url'
base-1      | worker   | /rails/lib/active_storage/service/postgresql_service.rb:71:in `url'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/app/models/active_storage/blob.rb:216:in `url'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/core_ext/module/delegation.rb:333:in `public_send'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/core_ext/module/delegation.rb:333:in `method_missing'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/core_ext/module/delegation.rb:333:in `public_send'
base-1      | worker   | /usr/local/bundle/gems/activesupport-7.1.3.2/lib/active_support/core_ext/module/delegation.rb:333:in `method_missing'
base-1      | worker   | /usr/local/bundle/gems/activestorage-7.1.3.2/app/models/active_storage/variant_with_record.rb:36:in `url'
base-1      | worker   | /rails/app/models/document.rb:46:in `fully_processed_url'
base-1      | worker   | /rails/app/models/message/document_image.rb:23:in `document_image_path'
base-1      | worker   | /rails/app/views/messages/_message.html.erb:76:in `_app_views_messages__message_html_erb___4486497489240695771_23380'
...

I redacted a bunch of stuff.

I think what is happening is that it's trying to display the small version of the image, but that is not yet processed. I am unsure how to handle this, I saw that you can pass some sort of a fallback into the URL generation of the version in document_image.rb#document_image_path. But actually it's checking whether that version has been processed already. Weird. Maybe I am also interpreting it wrong.

krschacht commented 3 weeks ago

Hmm, this is tricky. I'm scratching my head on this one. But when I initially wrote this document / file / active storage stuff I never fully wrapped my mind around file variants and how processing of that worked. The root thing that I got stuck on was:

Some methods of accessing a file variant (I think the typical methods) are blocking: if the variant doesn't exist it processes it right then and hangs until it's done. I didn't want this. It took me a bunch of trial and error to figure out how to check if a variant is processed in a non-blocking way so that I could show a fallback image / spinner and then just check again on some interval.

I got it working, but it was definitely a case where I wrote some code that I didn't fully understand what it was doing and when I finally got it working I was done staring at that problem. :) I've been wanting to come back to it. I even have a tiny bit of work I did on it in another branch:

There is a small diff on this file (linking straight to document.rb) and the one right below it: https://github.com/AllYourBot/hostedgpt/pull/362/files#diff-893a2b609f3fc4aade494f624a20993eb6c32598512b5e70d9ad39866b8eb51d

But that diff is solving a different problem I ran into with images. I don't think it solves the problem you're bumping into.... But I'm not sure what problem you're bumping into... :)

I think you shoudl put a breakpoint in document.rb right here:

  def fully_processed_url(variant)
    binding.pry
    file.attached? && variant.present? && file.representation(variant.to_sym).processed.url
  end

I assume file.attached? is returning true and variant.present? are returning true. But they're somehow lying and the variant isn't fully done? It may take some trial and error on this command file.representation(variant.to_sym).processed.url to figure out what's accessible and what isn't. Maybe a & is needed after one of those periods? That doesn't make sense though. Maybe variant.present? becomes true when a variant has started processing but before it's finished processing? Ugh....

The problem with this breakpoint is that there may be a bug at the moment the breakpoint is hit, but by the time you get in console the image may have finished processing and when you execute file.representation(variant.to_sym).processed.url it may work fine.

I'm really not sure how to debug this...

Ohoh, did you try uploading an image yourself into a conversation, through the app's UI? Does that fully work? You should be 100% certain that the existing image upload flow works. Ideally try uploading a really big image (10-20mb so that it takes a few seconds to process) and confirm for yourself that you initially see a spinner on the image and then later it pops in. Note that there are two places to check: the image appearing in your conversation directly and, click the image, there is a different variant that opens int he modal. We want to make sure there isn't some deeper configuration issue in your environment which is breaking all image processing. Like maybe a corrupted libmagick install or something (I forget what library the app uses).

The last thing that occurs to me is: active job is pushing a _message to the client. Until now the only time the image checks of _message are executing is when the partial is embedded in a view. But within the context of a job, _message is being rendered — and hitting those image checks — within the context of a job so it's outside of the controller. Maybe there is some context about being outside of a controller that messes up these view checks?

We could try to re-write the logic of message.document_image_path(...) so that when this is rendering within a job (we pass in some extra param in_job: true) then when it hits document_image_path() we tell it to skip any variant processed checks and always give us the redirect URL. It would be something like:

  GetNextAIMessageJob.broadcast_updated_message(@message, thinking: false, outside_controller: true)
  message.document_image_path(..., force_redirect: outside_controller)
  def document_image_path(variant, fallback: nil, force_redirect: false)
    return nil unless has_document_image?

    if !force_redirect && documents.first.has_file_variant_processed?(variant)
      documents.first.fully_processed_url(variant)
    elsif force_redirect || fallback.nil?
      documents.first.redirect_to_processed_path(variant)
    else
      fallback
    end

If image uploads fully work for you when done manually, and if poking around that binding.pry does not yield any clues, then I'd try this solution. I don't love it because it's not actually fixing the problem it's just side stepping it. It would be forcing actioncable pushed messages to never render images and then letting the front-end refresh the image. But I still can't imagine why an image variant check would fail in a queue context but work in a controller context.

krschacht commented 1 week ago

I just converted the PR to a draft to help me stay organized in knowing which ones are waiting on me to review. But feel free to switch it back whenever you're ready.