breadboard-ai / breadboard

A library for prototyping generative AI applications.
Apache License 2.0
84 stars 16 forks source link

Improve `llm-content` editor for Visual Editor Easy Mode #1471

Closed dglazkov closed 3 weeks ago

dglazkov commented 1 month ago

Currently, our llm-content editor looks like this:

image

It's good, but probably not super-friendly for immersive prompt editing. Let's figure out what it needs to look like for the Visual Editor Easy Mode MVP.

paullewis commented 4 weeks ago

Something in this area that's worth exploring. At some level llm-content is really generic, inasmuch as it's some base64-encoded data with a mime type, whether that be for images, text, or audio files. In the same vein we also have drawables, webcam access, and microphone access, which are more direct for the user than a file, but which we still emit in the same format: base64-encoded data with a mime type.

So I wonder, is there a version of this UI where we support not just files, but also device access too? From an API perspective there's little to stop me mixing webcam with some audio and a text file because they can all come through as base64 + mime. What we can then do from a UI perspective is allow someone to go from that "anything goes" version of events to a more limited palette by specifying – say – filters: allowed=['webcam', 'microphone', 'image'].

wdyt?

dglazkov commented 4 weeks ago

Some notes on https://breadboard-ai.web.app/?board=%2Fgraphs%2Fsuper-worker-test.json so far:

There seems to be four UI surfaces for the llm-content that I can see:

  1. The configuration editor in Selected Node panel:

    image
  2. The unsubmitted input the Board panel during a run:

    image
  3. The submitted input in the Board panel during a run:

    image
  4. The output in the Board panel during/after a run:

    image

My intuition is that 1 and 2 should look/act roughly the same. And 3 and 4 should be similarly paired. The 4 is a little weird, because it's actually an array of llm-content. Maybe for MVP we just show the last element?

In more general, frameworkey terms, it looks like maybe there's an Input/Output provider for a given behavior, and the provider vends two web components with pre-defined properties/events?

One of the things they could take as args is more behavior hints to specify things like the allowed bits in previous comment.

WDYT?

dglazkov commented 3 weeks ago

Found one more place where the the "llm-content" editor could go: in the schema editor. Very turducken.

image

paullewis commented 3 weeks ago

I think anywhere with the behavior of 'llm-content' should get either the editor or the renderer, depending on the context, right?

paullewis commented 3 weeks ago

I think we can probably declare victory here!

paullewis commented 3 weeks ago

I have a follow-up bug for the Activity Log so I'm going to close this one :shipit: