C-Loftus / talon-ai-tools

Query LLMs and AI tools with voice commands
http://colton.place/talon-ai-tools/
MIT License
46 stars 17 forks source link

More sophisticated functionality for generating text from images and apply prompts to images #27

Closed C-Loftus closed 6 months ago

C-Loftus commented 6 months ago

Describe images using custom prompts and more sophisticated TTS integration for helping users who are blind or for those that do lots of UI design

C-Loftus commented 6 months ago

@jaresty When you get a chance maybe you could test this out. I want to improve the CSS styling on the image output description. If you have any bandwidth, improving the CSS for the HTML builder would be the main thing I would like to improve. I have moved it to a new file.

Essentially image description works pretty well and we have two new settings for whether or not we want to open the description in a new webpage and how much content we wanted to describe back.

I think we have most of the functionality here just thinking about improving UX

I think image generation (beyond simple prompting) or any sort of image editing is not really worthwhile at the moment since it only returns square images and we also have to worry about managing file uploads to openai which is sort of beyond the scope of this PR. Image generation really needs a more proper UI/GUI to be done well I think

C-Loftus commented 6 months ago

Think the css should be fixed. If this looks good, it should be ready to merge. UX can continue to be improved, but it is generally satisfactory and good to iterate on