More sophisticated functionality for generating text from images and apply prompts to images

C-Loftus commented 6 months ago

Describe images using custom prompts and more sophisticated TTS integration for helping users who are blind or for those that do lots of UI design

[ ] ~~Have to figure out how to convert files on the clipboard to squares since the openai api only works on square images~~
[ ] ~~Have to figure out file uploading and deleting since applying edits to the image needs to be done on stored images. Want to make sure that storing images isn't going to cause weird issues with billing~~

C-Loftus commented 6 months ago

@jaresty When you get a chance maybe you could test this out. I want to improve the CSS styling on the image output description. If you have any bandwidth, improving the CSS for the HTML builder would be the main thing I would like to improve. I have moved it to a new file.

Essentially image description works pretty well and we have two new settings for whether or not we want to open the description in a new webpage and how much content we wanted to describe back.

I think we have most of the functionality here just thinking about improving UX

I think image generation (beyond simple prompting) or any sort of image editing is not really worthwhile at the moment since it only returns square images and we also have to worry about managing file uploads to openai which is sort of beyond the scope of this PR. Image generation really needs a more proper UI/GUI to be done well I think

C-Loftus commented 6 months ago

Think the css should be fixed. If this looks good, it should be ready to merge. UX can continue to be improved, but it is generally satisfactory and good to iterate on

C-Loftus / talon-ai-tools

More sophisticated functionality for generating text from images and apply prompts to images #27