C-Loftus / talon-ai-tools

Query LLMs and AI tools with voice commands
http://colton.place/talon-ai-tools/
MIT License
50 stars 20 forks source link

Conversation With the Model #78

Closed Mark-Phillipson closed 1 month ago

Mark-Phillipson commented 3 months ago

Would be nice to be able to have a conversation with the model with the responses back in voice.

For example "what is the capital of France?" and the model would reply with "Paris".

Talon file script example:

computer <user.text>:
    # will take the argument as a question and reply with voice
    text = user.text
    result = user.gpt_apply_prompt("Please reply in summary only", text)
    user.tts(result)

This is a simple example of how to use the model to reply to a question in voice. The model will reply with a summary of the text.

It would also be beneficial to have a feature that allows us to add each interaction to a list for the model to record. Additionally, we would require the functionality to clear this list when necessary.

For example "what is the capital of France?" and the model were reply it with "Paris". Then the user asked "what about Germany?" The model would reply with "Berlin".

computer <user.text>:
    # will take the argument as a question and reply with voice
    text = user.text
    user.add_to_list(text)
    result = user.gpt_apply_prompt("Please reply in summary only", user.get_list())
    user.tts(result)

This is a simple example of how to use the model to reply to a list of questions in voice. The model will reply with a summary of the list. Note the add to list functionality would have to be implemented.

Also it's a bit of a stretch to have the text to speech here as not everybody would have it installed so not sure how to get around that.

jaresty commented 3 months ago

I think tts could be another destination-wdyt, @C-Loftus ? You could say "model prompt spoken" and have it reply with tts.

C-Loftus commented 3 months ago

Thanks for your comment

  1. With regards to the point on conversion, yes that is something I would like to support. Currently all requests are stateless. If we want to allow state we need:

    • A way to make sure the state of the conversation has a very clear start and end. (i.e. we don't want the user to accidentally include irrelevant state to the next request)
    • An optional gui to understand the state passed into the request. (i.e. a chat-style format like the one you'd see with chatgpt, but more minimal to clearly represent the state of the conversation).
      • However, I am not sure how to synchronize a static HTML page with a backend without using a web server since I don't want to do that in Talon for security reasons.
    • A way to make it feel consistent and natural for users even while switching between stateful and stateless commands.
    • A clear understanding of what state should be passed in given the fact we may have previously manipulated an editable text field. This is non-trivial and not something that a normal chatbot doesn't need to work about. Is the source just the selected text? Or is it what the model returned previously? Should selected text be removed if it was pasted from the model previously? All of these things are things we need to consider.

      • Unless of course at first we only support TTS and state-based responses that don't change an editable text field
  2. @Mark-Phillipson with regards to TTS, have you used any of the commands in the TTS folder? (i.e. do you have https://github.com/C-Loftus/sight-free-talon installed for TTS?) I am curious on your user experience and if you find it useful. I also think that tts is very useful and makes it so you don't need to clutter the screen or paste anything. However, installing tts is a pain at the moment since Talon doesn't have a package manager.

    • @jaresty I agree that something like "echoed" or "spoken" could be a good insertionDestination. So the grammar would end up being something like model fix grammar echoed or model fix grammar spoken. So we keep the point free / pipeline pattern.
    • I could exploring splitting sight-free-talon to be one repo that is just the tts library and another that is the code for screen reader users. And then we could bundle the tts lib as a git submodule. But this might be overkill. Not sure how annoying people find sight-free-talon installation to be.
Mark-Phillipson commented 3 months ago

Yes I had to install the tts manually which was not straight forward.

Sometimes whilst working I'm trying to remember the name of for example a CSS property and just need a quick reminder and don't want to lose my place or focus. So I thought it would be cool to have a text to speech feature that I can trigger with a voice command.

Another example when I'm reading an article on a website I can say computer define whatever the word is that I need a definition for without losing my place.

C-Loftus commented 3 months ago

Yes I had to install the tts manually which was not straight forward.

When you say manually do you mean you copied and pasted just the text to speech code from sight-free-talon and didn't clone that repository, or rather that you did clone sight-free-talon but didn't find it intuitive to use?

Do you have any suggestions on how I could make it more intuitive?

Sometimes whilst working I'm trying to remember the name of for example a CSS property and just need a quick reminder and don't want to lose my place or focus. So I thought it would be cool to have a text to speech feature that I can trigger with a voice command.

Yup I agree!

Mark-Phillipson commented 3 months ago

I did clone the site free talon repository but as I only did it once I can't really remember much about it suffice to say it is working and I can remember having to change the speed of the voice to be able to understand it.

C-Loftus commented 1 month ago

Should be implemented now. You can do the following to have a conversation with the model which can be optionally verbal via TTS if you would like

  1. model start thread -> which will store your conversation in a new thread
  2. model toggle window -> to open the window which shows thread visualizations
  3. Any model command you request (i.e. model please tell me X) will be auto added to the window after it returns its result
  4. If you use to speech as your model destination, it will speak the output and if you have the thread enabled, it will continue to update the window.
    • You can override the default destination to be speech if you always want speech. We have a setting for that now.

If you feel this is missing behavior, please file a new issue so we can iterate