Kardbord / Kard-bot

A Discord bot destined for greatness
GNU Affero General Public License v3.0
1 stars 0 forks source link

Add human-in-the-loop to /dalle-flow #73

Open Kardbord opened 1 year ago

Kardbord commented 1 year ago

Overview

jina-ai/dalle-flow provides a relatively simple client interface for human-in-the-loop generation of images from user prompts. When /dalle-flow was initially implemented, all human interaction beyond the initial prompt was removed for simplicity. For a better user experience, human-in-the-loop functionality should now be added back in. The interaction should go something like this.

sequenceDiagram
actor U as User
participant C as Discord Client
participant K as Kard-bot
participant F as Disk
participant D as Dalle-Flow<br>Client Script
participant S as Jina-AI<br>Dalle-Flow Server

U->>C: /dalle-flow [prompt]
C->>K: Interaction Data
K->>D: Prompt
D->>S: gRPC Request<br>"Generate X images<br>from prompt"
S->>D: gRPC Response<br> "Image Sprites"
D->>F: Plotted Sprites Image
F->>K: Image Data
K->>C: Interaction Response<br>"Image Data"
Note left of K: Interaction response<br>includes buttons for<br>variations (diffusion)<br>and upscaling,<br>similar to Midjourney<br>interface.
C->>U: Render Plotted Sprites
opt Diffuse
U->>C: Diffuse Sprite X
C->>K: Interaction Data
K->>D: Image Data (Sprites)<br>and Diffusion Selection
D->>S: gRPC Request<br>"Diffuse sprite X"
S->>D: gRPC Response<br>"Diffused Image Sprites"
D->>F: Plotted Sprites Image
F->>K: Image Data
K->>C: Interaction Response<br>"Image Data"
Note left of K: Interaction response<br>includes buttons for<br>variations (diffusion)<br>and upscaling,<br>similar to Midjourney<br>interface.
C->>U: Render Plotted Sprites
end
opt Upscale
U->>C: Upscale Sprite X
C->>K: Interaction Data
K->>D: Image Data (Sprites)<br>and Upscale Selection
D->>S: gRPC Request<br>"Upscale sprite X"
S->>D: gRPC Response<br>"Upscaled Image"
D->>F: Upscaled<br>Image Data
F->>K: Upscaled<br>Image Data
K->>C: Interaction Response<br>"Upscaled Image Data"
Note left of K: Final response,<br>no buttons.
C->>U: Render Upscaled Image
end

Subtasks

Edit: These subtasks are OBE. See below.

- [x] Update dalle-flow.py to optionally take a prompt, generate images, and write the the sprite plot to disk. - [ ] Update dalle-flow.py to optionally take a path to a sprite plot, diffuse a selected sprite, and write the result to disk. - [ ] Update dalle-flow.py to optionally take a path to a sprite plot, upscale a selected sprite, and write the result to disk. - [ ] Update /dalle-flow interaction to return a sprite plot with buttons for upscaling or diffusion. - [ ] Implement upscale button handlers. - [ ] Implement diffusion button handlers.

Kardbord commented 1 year ago

The implementation of this issue is going to change significantly with the move to Open AI's Dalle-2 service as opposed to Jina AI's dalle-flow. See #74.

Kardbord commented 1 year ago

With the introduction of /render in #101 the implementation details of this will be affected, but the overall scope remains the same.