Sharrnah / whispering-ui

Native UI for the Whispering Tiger project - https://github.com/Sharrnah/whispering (live transcription / translation)
https://whispering-tiger.github.io/
MIT License
218 stars 12 forks source link

Some optimization suggestions #20

Closed OptimisticGeek closed 2 months ago

OptimisticGeek commented 5 months ago

Hello, Sharrnah

In use, I have several optimization suggestions:

  1. Add a button to clear the history on the right
  2. Add a checkbox to control whether the history is displayed in the list
  3. Optimize the display of the history list, increase the adaptive height. Now, the translation and original text will be mixed together render
  4. Add shortcut keys to control whether to send data to the VRC through OSC.
  5. Whether it can support the function of dynamic switching microphone to recognize in-game speech or speak by yourself. Or support both in-game voice and own voice, which is more memory saving than I open two Tiger.Now due to graphics card limitations, I can only very difficult to choose one of them.

Your Chinese friend, OptimisticGeek

image

Sharrnah commented 5 months ago

Thanks for the suggestions.

  1. Add a button to clear the history on the right

Should be no problem. Should this also affect the CSV export? since both are saved differently.

  1. Add a checkbox to control whether the history is displayed in the list

What would this good for? If you don't need the history, you don't need to use it. Just curious. If you give me a good reason i am more than happy to add it.

  1. Optimize the display of the history list, increase the adaptive height. Now, the translation and original text will be mixed together render

This is a known issue with the currently used UI library. The new version of it should allow this, but it has heavy memory issues when using a font that supports all the different language characters, so i kept it on the currently old version. They are improving it already, so this is hopefully soon a thing of the past.

  1. Add shortcut keys to control whether to send data to the VRC through OSC.

Do you have some suggestion for a shortcut key? Or should it be configurable similar to the push to talk configuration?

  1. Whether it can support the function of dynamic switching microphone to recognize in-game speech or speak by yourself. Or support both in-game voice and own voice, which is more memory saving than I open two Tiger.Now due to graphics card limitations, I can only very difficult to choose one of them.

This is on my roadmap, But i can't say any date when this is ready yet as its not so simple in how it currently all works together.

OptimisticGeek commented 5 months ago

Should be no problem. Should this also affect the CSV export? since both are saved differently. What would this good for? If you don't need the history, you don't need to use it. Just curious. If you give me a good reason i am more than happy to add it.

I didn't need to use the history.I wanted to minimize my hardware consumption as much as possible, but I didn't know how to turn it off. Is there a risk of memory leaks if the list size is too long? I suggest that you record all the records in the local database and export the records in the database when saving CSV.

Do you have some suggestion for a shortcut key? Or should it be configurable similar to the push to talk configuration?

Yes, you need to support configurable shortcuts to avoid conflicts with other applications. It would be better to support shortcuts such as F1~F12, PrtSc, Home, and Pause.

This is on my roadmap, But i can't say any date when this is ready yet as its not so simple in how it currently all works together.

I also think this is a complicated project. Please release this exciting news when you have new progress.

Sharrnah commented 5 months ago

I didn't need to use the history.I wanted to minimize my hardware consumption as much as possible, but I didn't know how to turn it off. Is there a risk of memory leaks if the list size is too long? I suggest that you record all the records in the local database and export the records in the database when saving CSV.

There should be no issue and i am not aware of any memory leak because of the history. And since it is only text, the RAM usage should be no issue and it only uses the CPU RAM and no Video-RAM.

The only possible memory leak i might have found is when using faster whisper, realtime mode and Run each transcription in a seperate thread. That is most likely an issue with Faster whisper though or maybe something specific on my PC since i never heard of any complain yet. But its why i added the "Run each transcript in a seperate thread" option.

OptimisticGeek commented 5 months ago

The only possible memory leak i might have found is when using , and . That is most likely an issue with Faster whisper though or maybe something specific on my PC since i never heard of any complain yet. But its why i added the "Run each transcript in a seperate thread" option.faster whisper``realtime mode``Run each transcription in a seperate thread

I haven't had any crashes so far

OptimisticGeek commented 5 months ago

5.Whether it can support the function of dynamic switching microphone to recognize in-game speech or speak by yourself. Or support both in-game voice and own voice, which is more memory saving than I open two Tiger.Now due to graphics card limitations, I can only very difficult to choose one of them.

If you switch the microphone and speaker through the shortcut key, Is it easier to implement? This is not the way of parallel recognition, but through a single recognition mode, only need to change the input source.

example:

  1. Switch to the speaker and use the speaker as an input source to listen for in-game speech
  2. Switch to the microphone and use the microphone as an input source to listen to your own voice
Sharrnah commented 5 months ago

It is not so much about switching the device, but about everything that is attached to it. For example, you probably do not want the text-to-speech trigger when you recorded someone else speaking. Or possibly you do not want to send the OSC of the other person,

or you do not want realtime mode for your own speech, but you want realtime mode for the other person etc.

I appreciate the suggestion though.

OptimisticGeek commented 5 months ago

It is not so much about switching the device, but about everything that is attached to it. For example, you probably do not want the text-to-speech trigger when you recorded someone else speaking. Or possibly you do not want to send the OSC of the other person,

or you do not want realtime mode for your own speech, but you want realtime mode for the other person etc.

I appreciate the suggestion though.

Yes, I didn't think it through enough, I hope my ideas can bring you some inspiration. I really like Tiger!

Sharrnah commented 2 months ago

I just released the new Update.

So i will close this issue. Feel free to open a new one if you are missing something or have new ideas. :)